How Androidify leverages Gemini, Firebase and ML Kit

How Androidify leverages Gemini, Firebase and ML Kit

Home » News » How Androidify leverages Gemini, Firebase and ML Kit
Table of Contents

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Supervisor

We’re bringing again Androidify later this 12 months, this time powered by Google AI, so you may customise your very personal Android bot and share your creativity with the world. As we speak, we’re releasing a brand new open supply demo app for Androidify as an ideal instance of how Google is utilizing its Gemini AI fashions to reinforce app experiences.

On this submit, we’ll dive into how the Androidify app makes use of Gemini fashions and Imagen through the Firebase AI Logic SDK, and we’ll present some insights realized alongside the way in which that will help you incorporate Gemini and AI into your individual initiatives. Learn extra concerning the Androidify demo app.

App move

The general app features as follows, with varied components of it utilizing Gemini and Firebase alongside the way in which:

flow chart demonstrating Androidify app flow

Gemini and picture validation

To get began with Androidify, take a photograph or select a picture in your gadget. The app must be sure that the picture you add is appropriate for creating an avatar.

Gemini 2.5 Flash through Firebase helps with this by verifying that the picture comprises an individual, that the individual is in focus, and assessing picture security, together with whether or not the picture comprises abusive content material.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "utility/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content material {
                textual content("You might be to investigate the offered picture and decide whether it is acceptable and applicable primarily based on particular standards.... (extra particulars see the total pattern)")
                picture(picture)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.textual content)
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content material

Within the snippet above, we’re leveraging structured output capabilities of the mannequin by defining the schema of the response. We’re passing a Schema object through the responseSchema param within the generationConfig.

We wish to validate that the picture has sufficient data to generate a pleasant Android avatar. So we ask the mannequin to return a json object with success = true/false and an elective error message explaining why the picture would not have sufficient data.

Structured output is a strong function enabling a smoother integration of LLMs to your app by controlling the format of their output, much like an API response.

Picture captioning with Gemini Flash

As soon as it is established that the picture comprises enough data to generate an Android avatar, it’s captioned utilizing Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val immediate = "You might be to create a VERY detailed description of the principle individual within the given picture. This description will likely be translated right into a immediate for a generative picture mannequin..."

val response = generativeModel.generateContent(
content material { 
       	textual content(immediate) 
             	picture(picture) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.textual content!!) 
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content material

The opposite choice within the app is to begin with a textual content immediate. You may enter in particulars about your equipment, coiffure, and clothes, and let Imagen be a bit extra inventive.

Android technology through Imagen

We’ll use this detailed description of your picture to counterpoint the immediate used for picture technology. We’ll add additional particulars round what we want to generate and embody the bot colour choice as a part of this too, together with the pores and skin tone chosen by the consumer.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic type, the pose is relaxed and easy, going through immediately ahead [...] The bot seems to be as follows $userDescription [...]"

We then name the Imagen mannequin to create the bot. Utilizing this new immediate, we create a mannequin and name generateImages:

// we provide our personal fine-tuned mannequin right here however you should utilize "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val picture = response.photographs.first().asBitmap()

And that’s it! The Imagen mannequin generates a bitmap that we are able to show on the consumer’s display screen.

Finetuning the Imagen mannequin

The Imagen 3 mannequin was finetuned utilizing Low-Rank Adaptation (LoRA). LoRA is a fine-tuning approach designed to scale back the computational burden of coaching giant fashions. As a substitute of updating the whole mannequin, LoRA provides smaller, trainable “adapters” that make small adjustments to the mannequin’s efficiency. We ran a effective tuning pipeline on the Imagen 3 mannequin usually obtainable with Android bot belongings of various colour mixtures and completely different belongings for enhanced cuteness and enjoyable. We generated textual content captions for the coaching photographs and the image-text pairs have been used to finetune the mannequin successfully.

The present pattern app makes use of a typical Imagen mannequin, so the outcomes might look a bit completely different from the visuals on this submit. Nonetheless, the app utilizing the fine-tuned mannequin and a customized model of Firebase AI Logic SDK was demoed at Google I/O. This app will likely be launched later this 12 months and we’re additionally planning on including assist for fine-tuned fashions to Firebase AI Logic SDK later within the 12 months.

moving image of Androidify app demo turning a selfie image of a bearded man wearing a black tshirt and sunglasses, with a blue back pack into a green 3D bearded droid wearing a black tshirt and sunglasses with a blue backpack

The unique picture… and Androidifi-ed picture

ML Package

The app additionally makes use of the ML Package Pose Detection SDK to detect an individual within the digicam view, which triggers the seize button and provides visible indicators.

To do that, we add the SDK to the app, and use PoseDetection.getClient(). Then, utilizing the poseDetector, we take a look at the detectedLandmarks which might be within the streaming picture coming from the Digital camera, and we set the _uiState.detectedPose to true if a nostril and shoulders are seen:

personal droop enjoyable runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .construct(),
    ).use { poseDetector ->
        // Since picture evaluation is processed by ML Package asynchronously in its personal thread pool,
        // we are able to run this immediately from the calling coroutine scope as an alternative of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.picture?.let { picture ->
                val poseDetected = poseDetector.detectPersonInFrame(picture, imageProxy.imageInfo)
                _uiState.replace { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

personal droop enjoyable PoseDetector.detectPersonInFrame(
    picture: Picture,
    imageInfo: ImageInfo,
): Boolean {
    val outcomes = course of(InputImage.fromMediaImage(picture, imageInfo.rotationDegrees)).await()
    val landmarkResults = outcomes.allPoseLandmarks
    val detectedLandmarks = mutableListOf<Int>()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}

moving image showing the camera shutter button activating when an orange droid figurine is held in the camera frame

The digicam shutter button is activated when an individual (or a bot!) enters the body.

Get began with AI on Android

The Androidify app makes an intensive use of the Gemini 2.5 Flash to validate the picture and generate an in depth description used to generate the picture. It additionally leverages the particularly fine-tuned Imagen 3 mannequin to generate photographs of Android bots. Gemini and Imagen fashions are simply built-in into the app through the Firebase AI Logic SDK. As well as, ML Package Pose Detection SDK controls the seize button, enabling it solely when an individual is current in entrance of the digicam.

To get began with AI on Android, go to the Gemini and Imagen documentation for Android.

Discover this announcement and all Google I/O 2025 updates on io.google beginning Might 22.

Supply hyperlink

author avatar
roosho Senior Engineer (Technical Services)
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog. 
share this article.

Enjoying my articles?

Sign up to get new content delivered straight to your inbox.

Please enable JavaScript in your browser to complete this form.
Name