gittech. site

for different kinds of informations and explorations.

UI-Tars Desktop by ByteDance

Published at
Jan 23, 2025

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

   πŸ“‘ Paper    | πŸ€— Hugging Face Models   |   πŸ«¨ Discord   |   πŸ€– ModelScope  
πŸ–₯️ Desktop Application    |    πŸ‘“ Midscene (use in browser)

⚠️ Important Announcement: GGUF Model Performance

The GGUF model has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to downgrade it.

πŸ’‘ Alternative Solution: You can use Cloud Deployment or Local Deployment [vLLM](If you have enough GPU resources) instead.

We appreciate your understanding and patience as we work to ensure the best possible experience.

Updates

Showcases

Instruction Video
Get the current weather in SF using the web browser
Send a twitter with the content "hello world"

Features

  • πŸ€– Natural language control powered by Vision-Language Model
  • πŸ–₯️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • πŸ’» Cross-platform support (Windows/MacOS)
  • πŸ”„ Real-time feedback and status display
  • πŸ” Private and secure - fully local processing

Quick Start

Download

You can download the latest release version of UI-TARS Desktop from our releases page.

Note: If you have Homebrew installed, you can install UI-TARS Desktop by running the following command:

brew install --cask ui-tars

Install

MacOS

  1. Drag UI TARS application into the Applications folder
  1. Enable the permission of UI TARS in MacOS:
  • System Settings -> Privacy & Security -> Accessibility
  • System Settings -> Privacy & Security -> Screen Recording
  1. Then open UI TARS application, you can see the following interface:

Windows

Still to run the application, you can see the following interface:

Deployment

Cloud Deployment

We recommend using HuggingFace Inference Endpoints for fast deployment. We provide two docs for users to refer:

English version: GUI Model Deployment Guide

δΈ­ζ–‡η‰ˆ: GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹

Local Deployment [vLLM]

We recommend using vLLM for fast deployment and inference. You need to use vllm>=0.6.1.

pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
Download the Model

We provide three model sizes on Hugging Face: 2B, 7B, and 72B. To achieve the best performance, we recommend using the 7B-DPO or 72B-DPO model (based on your hardware configuration):

Start an OpenAI API Service

Run the command below to start an OpenAI-compatible API service:

python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
Input your API information

Note: VLM Base Url is OpenAI compatible API endpoints (see OpenAI API protocol document for more details).

Contributing

CONTRIBUTING.md

SDK(Experimental)

SDK

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}