Product Promotion
gittech.site
for different kinds of informations and explorations.
I built a tool for mobile and computer use using local and remote LLMs
ClickClickClick
A framework to enable autonomous android and computer use using any LLM (local or remote)
Demos
create a draft gmail to [email protected] and ask them if they are free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.
https://github.com/user-attachments/assets/7cdbebb7-0ac4-4c20-8d67-f3c07cd4ab01
Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI
https://github.com/user-attachments/assets/eb5dc968-206b-422d-aa3c-20c48bac3fed
start a 3+2 game on lichess
https://github.com/user-attachments/assets/68fc3475-2299-4254-8673-3123356177b5
Currently supporting local models via Ollama (Llama 3.2-vision), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.
The best result currently comes from using GPT 4o/4o-mini as planner and Gemini Pro/Flash as finder.
Prerequisites
- This project needs
adb
to be installed on your local machine where the code is being executed. - Enable USB debugging on the android phone.
- Python >= 3.11
How to install
Clone the repository and navigate into the project directory:
git clone https://github.com/BandarLabs/clickclickclick
cd clickclickclick
It is recommended to create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install the dependencies:
pip install -r requirements.txt
How to use
Put your model specific settings in config/models.yaml
and export the keys specified in the yaml file.
As CLI tool
Install the tool
(Ensure OPENAI_API_KEY
and GEMINI_API_KEY
API keys in the environment)
pip install https://github.com/user-attachments/files/18163076/click3-0.2.0.tar.gz
click3 run open uber app
As Script
Setup
By default, planner is openai
and finder is gemini
.
You can change the default configuration in config/models.yaml
Before running any tasks, you need to configure respective keys like OPENAI_API_KEY
and GEMINI_API_KEY
in the environment.
Gemini Flash gives free 15 API calls - https://aistudio.google.com/apikey
Running Tasks
To execute a task, use the run
command. The basic usage is:
python main.py run <task-prompt>
Options
-
--platform
: Specifies the platform to use, eitherandroid
orosx
. Default isandroid
.python main.py run "example task" --platform=osx
-
--planner-model
: Specifies the planner model to use, eitheropenai
,gemini
, orollama
. Default isopenai
.python main.py run "example task" --planner-model=gemini
-
--finder-model
: Specifies the finder model to use, eitheropenai
,gemini
, orollama
. Default isgemini
.python main.py run "example task" --finder-model=ollama
Example
A full example command might look like:
python main.py run "Open Google news" --platform=android --planner-model=openai --finder-model=gemini
Use as an API
To run the app
uvicorn api:app
POST /execute
Description:
This endpoint executes a task based on the provided task prompt, platform, planner model, and finder model.
Request Body:
task_prompt
(string): The prompt for the task that needs to be executed.platform
(string, optional): The platform on which the task is to be executed. Default is "android". Supported platforms: "android", "osx".planner_model
(string, optional): The planner model to be used for planning the task. Default is "openai". Supported models: "openai", "gemini", "ollama".finder_model
(string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".
Response:
200 OK
:result
(object): The result of the task execution.
400 Bad Request
:detail
(string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
500 Internal Server Error
:detail
(string): Description of the error that occurred during task execution.
Example Request:
curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{
"task_prompt": "Open uber app",
"platform": "android",
"planner_model": "openai",
"finder_model": "gemini"
}'
Example Response:
{"result":true}
How to contribute
Contributions are welcome! Please begin by opening an issue to discuss your ideas. Once the issue is reviewed and assigned, you can proceed with submitting a pull request.
Things to do
- Enable local models via Ollama on Android
- Make computer use fully functional
License
This project is licensed under the MIT License. See the LICENSE file for details.
Made with ❤️
to provide different kinds of informations and resources.