Inspiration
Phage transforms a standard smartphone into an autonomous digital symbiote that lives within its "physical habitat." It doesn't just process text; it perceives and interacts with the world through three core capacities:
Multimodal Perception (Vision): Phage uses the device’s camera as its eyes. It can take snapshots of its surroundings, analyze them using Gemini 2.5 Flash, and provide the user with real-time insights or identify objects to help navigate the physical world.
Natural Voice Interaction: Moving away from rigid text interfaces, Phage engages in a native voice-to-voice loop. It "hears" the host's tone and intent directly from audio notes and responds with high-fidelity, expressive speech, making the interaction feel human and immediate.
Hardware Agency (The "Muscle"): This is where Phage breaks the screen barrier. By bridging the cloud to a local Termux environment, Phage can execute physical actions on the device. It can toggle the flashlight, manage system volume, trigger haptic feedback, and run complex shell scripts to manage the phone's resources autonomously.
Intelligent Intent Routing: Phage is smart enough to know when to talk and when to act. It acts as a "Pre-frontal Cortex" for the phone, deciding whether a request requires a conversation or a direct hardware intervention.
What it does
Phage transforms a standard smartphone into an autonomous digital symbiote that lives within its "physical habitat." It doesn't just process text; it perceives and interacts with the world through three core capacities:
Multimodal Perception (Vision): Phage uses the device’s camera as its eyes. It can take snapshots of its surroundings, analyze them using Gemini 2.5 Flash, and provide the user with real-time insights or identify objects to help navigate the physical world.
Natural Voice Interaction: Moving away from rigid text interfaces, Phage engages in a native voice-to-voice loop. It "hears" the host's tone and intent directly from audio notes and responds with high-fidelity, expressive speech, making the interaction feel human and immediate.
Hardware Agency: This is where Phage breaks the screen barrier. By bridging the cloud to a local Termux environment, Phage can execute physical actions on the device. It can toggle the flashlight, manage system volume, trigger haptic feedback, and run complex shell scripts to manage the phone's resources autonomously.
Intelligent Intent Routing: Phage is smart enough to know when to talk and when to act. It acts as a "Pre-frontal Cortex" for the phone, deciding whether a request requires a conversation or a direct hardware intervention.
How we built it
Phage is built using a three-part system that connects a cloud-based AI to a physical Android smartphone.
The Intelligence (Gemini 2.5 Flash): We used the Google GenAI SDK to power the system’s "thinking." We chose Gemini 2.5 Flash because it can process voice and images directly. This allows the user to talk to Phage naturally without needing a middle step to convert speech into text first.
The Cloud Backend (Google Cloud Platform): * Cloud Functions: We wrote our main code in Python and hosted it on Google Cloud Functions. This serves as the "operator" that receives messages from the user and sends them to the Gemini AI for a decision.
Firestore Database: We used Firestore to connect the cloud to the phone. When the AI decides to perform an action (like turning on a light), the Cloud Function saves that command to Firestore. This acts as a real-time "to-do list" that the phone can check instantly.
Text-to-Speech: To allow the AI to speak back, we integrated the Google Cloud Text-to-Speech API. This converts the AI's written response into a high-quality voice file.
The Phone Execution (Android & Termux): On the actual smartphone, we used an app called Termux, which allows us to run code directly on the Android system. We wrote a script that constantly checks the Firestore "to-do list." When it sees a new command from the cloud, it executes it on the phone—allowing the AI to control the flashlight, the camera, and system settings.
The User Interface (Telegram): We used the Telegram Bot API as the primary way to interact with the system. This allows the user to send text, voice notes, or photos from anywhere in the world and receive a voice or action response from their phone
Challenges we ran into
Building a system that connects a cloud-based brain to physical hardware in real-time wasn't easy. Here are the main hurdles we had to overcome:
The "Sync" Gap: One of the biggest challenges was making the communication between the Google Cloud backend and the Android device feel instant. In the beginning, there was a significant delay (latency). We had to optimize how the phone "polls" the database to ensure that when you tell Phage to do something, it happens almost immediately without draining the phone's battery.
Multimodal Complexity: Handling different types of data at once—like a voice note combined with a text instruction—caused several "Invalid Argument" errors in the early stages. We had to spend a lot of time learning the specific way the Google GenAI SDK expects data to be formatted so that the AI could "hear" and "read" at the same time without crashing.
Handling High Capacity: Because we are sending voice notes back and forth, we quickly ran into API rate limits. This taught us how to manage a billed Google Cloud account and optimize our code to handle the high volume of data required for high-quality voice interactions.
The Hardware Bridge: Getting a cloud server to talk to a phone’s internal hardware (like the flashlight or vibration motor) is not a standard feature. We had to build a custom "bridge" using Termux and shell scripting to translate the AI's digital decisions into physical actions.
Accomplishments that we're proud of
Making it Work: We are proud that we successfully made an AI in the cloud control the physical parts of a phone.
The Conversation: We created a smooth system where you can talk back and forth with the AI as if it were a real person.
High Tech on a Budget: We proved that you don't need an expensive phone to use advanced AI. We got Phage running perfectly on a $17 smartphone, showing that anyone can have access to this technology
What we learned
Intelligence is Independent of Hardware Price: One of our biggest lessons was that a $17 smartphone can be just as smart as a flagship device when it is powered by Gemini 2.5 Flash. We learned that AI is the ultimate equalizer—it can turn affordable hardware into a powerful, high-end assistant.
The Importance of Perception: We discovered that AI becomes much more helpful when it can "see" and "hear" its surroundings. Moving from a text-only chatbot to a multimodal agent changed how we think about the relationship between humans and their devices.
Cloud Synergy: We learned how to effectively combine different Google Cloud services. Seeing how Cloud Functions, Firestore, and Gemini work together to create a "nervous system" for a physical device was an eye-opening experience in scalable engineering.
What's next for phage
App Navigation: We are working on giving Phage the ability to "see" screenshots and navigate other apps on the phone (like WhatsApp or a calendar) by tapping and swiping the screen on its own.
Localized Language Support: Since we are based in Kigali, we plan to integrate Kinyarwanda language support. This will make Phage more accessible and helpful for the local community here in Rwanda.
A Self-Improving System: We want to build a feedback loop where Phage can learn from its actions. If a command fails, Phage will be able to analyze the error and try a different approach until it succeeds, making it truly autonomous
Log in or sign up for Devpost to join the conversation.