AI Assistant that helps you localize and open programs using text input or voice.
The program makes a screenshot of your current screen and sends it to Gemini with a prompt for it to identify the location of your request and (if checked) try to open it.
It is recommended to use the program with a plain background, where the program icons can be seen easily.
Currently, it uses Gemini Flash 2.5, the most recent and advanced version of the model, even so, it will often make mistakes, hopefully it gets sharper in newer versions.
Important
Be aware that Gemini AI doesn't actually know the programs installed in your device and can make mistakes, so enable the checkbox to let Gemini try to open only if you make sure nothing unexpected can happen.
You'll need two things to use this program: Gemini API KEY and the required Python libraries installed.
Obtain your Gemini API Key by visiting Google AI Studio. Ensure you are logged into your Google account, then press the blue button that says 'Create API key' and follow the steps to set up your Google Cloud Project and retrieve your API key. Make sure to save it in a safe place.
Google allows free use of this API without adding billing information, but there are some limitations.
In Google AI Studio, you can monitor the AI's usage by clicking 'View usage data' in the 'Plan' column where your projects are displayed. I recommend monitoring the 'Quota and system limits' tab and sorting by 'actual usage percentage,' as it provides further more detailed information.
Then, download or clone the Python script and run this command on the same folder:
python pip install pystray pillow speechrecognition google-generativeai pyaudio pyautogui
Or if it fails or you have a different Python version:
py -m pip install pystray pillow speechrecognition google-generativeai pyaudio pyautogui