
gittech. site
for different kinds of informations and explorations.
UI-Tars Desktop by ByteDance
UI-TARS Desktop
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
   π Paper   
| π€ Hugging Face Models  
|   π«¨ Discord  
|   π€ ModelScope  
π₯οΈ Desktop Application   
|    π Midscene (use in browser)
β οΈ Important Announcement: GGUF Model Performance
The GGUF model has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to downgrade it.
π‘ Alternative Solution: You can use Cloud Deployment or Local Deployment [vLLM](If you have enough GPU resources) instead.
We appreciate your understanding and patience as we work to ensure the best possible experience.
Updates
- π 01.25: We updated the Cloud Deployment section in the δΈζη: GUI樑ει¨η½²ζη¨ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
Showcases
Instruction | Video |
---|---|
Get the current weather in SF using the web browser | |
Send a twitter with the content "hello world" |
Features
- π€ Natural language control powered by Vision-Language Model
- π₯οΈ Screenshot and visual recognition support
- π― Precise mouse and keyboard control
- π» Cross-platform support (Windows/MacOS)
- π Real-time feedback and status display
- π Private and secure - fully local processing
Quick Start
Download
You can download the latest release version of UI-TARS Desktop from our releases page.
Note: If you have Homebrew installed, you can install UI-TARS Desktop by running the following command:
brew install --cask ui-tars
Install
MacOS
- Drag UI TARS application into the Applications folder

- Enable the permission of UI TARS in MacOS:
- System Settings -> Privacy & Security -> Accessibility
- System Settings -> Privacy & Security -> Screen Recording

- Then open UI TARS application, you can see the following interface:

Windows
Still to run the application, you can see the following interface:

Deployment
Cloud Deployment
We recommend using HuggingFace Inference Endpoints for fast deployment. We provide two docs for users to refer:
English version: GUI Model Deployment Guide
δΈζη: GUI樑ει¨η½²ζη¨
Local Deployment [vLLM]
We recommend using vLLM for fast deployment and inference. You need to use vllm>=0.6.1
.
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
Download the Model
We provide three model sizes on Hugging Face: 2B, 7B, and 72B. To achieve the best performance, we recommend using the 7B-DPO or 72B-DPO model (based on your hardware configuration):
Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
Input your API information

Note: VLM Base Url is OpenAI compatible API endpoints (see OpenAI API protocol document for more details).
Contributing
SDK(Experimental)
License
UI-TARS Desktop is licensed under the Apache License 2.0.
Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}