MLC Chat is already on the App Store and allowed to be used. I haven't used Yi with it, but a quantized Mistral or Llama runs quite well on an iPhone 15. See https://llm.mlc.ai. "Apple GPT" is also rumored to be coming too.
It is processor and therefore battery intensive but it already won't kill your battery inside of 30 minutes. Obviously it will be worse for resource usage than an app if it's always kept running by some OS level process and set as the processing layer for every trivial thing but it seems like cheaper input handling could decide to promote some input up to being evaluated by an LLM or not.
It is processor and therefore battery intensive but it already won't kill your battery inside of 30 minutes. Obviously it will be worse for resource usage than an app if it's always kept running by some OS level process and set as the processing layer for every trivial thing but it seems like cheaper input handling could decide to promote some input up to being evaluated by an LLM or not.