I’m really curious about which option is more popular. I have found, that format JSON works great even for super small models (e.g. Llama 3.2-1B-Q4 and Qwen-2.5-0.5B-Q4) which is great news for mobile devices!
But the strictly defined layout of function calling can be very alluring as well, especially since we could have an LLM write the layout given the full function text (as in, the actual code of the function).
I have also tried to ditch the formatting bit completely. Currently I am working on a translation-tablecreator for Godot, which requests a translation individually for every row in the CSV file. Works mostly great!
I will try to use format JSON for my project, since not everyone has the VRAM for 7B models, and it works just fine on small models. But it does also mean longer generation times… And more one-shot prompting, so longer first-token-lag.
Format JSON is too useful to give up for speed.