Hacker News new | past | comments | ask | show | jobs | submit | BoorishBears's comments login

Can tools return images to the LLM using this SDK? (where supported by the provider)

Yes, we support images, tools, response format (for OpenAI) and everything else.

I'm asking specifically about images being returned by a tool call. I didn't see any indication that it's supported skimming through the Zod types

Most of these prompts come from LLMs, so it's trivial to instruct them to provide a string that's broken out like that.

Also not the end of the world to process stuff like this with a regex.

Most of these newer TTS models require this type of formatting to reliably state long strings of numbers and IDs


I can't tell if this is satire, or if replicating 6 million years of evolution has legitimately become handwave material for Elon's supporters...

They are afraid, times of crisis - especially planetary one, have the weaker minded and scared ones always rally around figureheads. Some guy in operetta uniforms, exclaiming "Im the captain, give me all your cash" brandishing a detached steering wheel is what the passengers want to see. Reality be a lovecraftian horror to much to bear.

Mammalian vision and vision itself have been around a lot longer than 6 million years by at least one, likely two, orders of magnitude.

I don't know if you've tried this recently, but take a photo of something on your phone and put it into an AI.

There may even be an AI built into your photo library app.


The fact I work on self-driving cars makes me a tiny bit more of a realist than someone who thinks CLIP is proof of what AI can and can't do...

I'm curious. Can you elaborate on what CLIP proves about what AI can and can't do?

My point is that it doesn't.

The fact your phone can identify an object doesn't inform you on the capabilities of self-driving car's vision stack. It's complete non-sequitur.


So your job is to, in your own words, be "replicating 6 million years of evolution"?

You know how big your own team is, and that your team is itself an abstraction from the outside world. You know you get the shortcuts of being able to look at what nature does and engineer it rather than simply copy without understanding. You know your own evolutionary algorithms, assuming you're using them at all, run as fast as you can evaluate the fitness function, and that that is much faster than the same cycle with human, or even mammalian, generational gaps.

> CLIP is proof of what AI can and can't do

CLIP says nothing about what AI can't do, but it definitely says what AI can do. It's a minimum, not a maximum.


Not to be rude but you're arguing with somebody that works in what I would assume is a highly mathematical space and asserting your opinion on how quickly that highly mathematical space can advance while your own profile admits that you were unable to understand "advanced calculus or group theory" and your own github indicates that you are stuck on "the hard stuff — abelian groups, curls, wedge products, Hessians and Laplacians" because you "don't understand the notation." Your opinion on the speed of advancement just doesn't seem informed?

Maybe this is an old post and your understanding has dramatically improved to the point where you're able to offer useful insight on ML/AI/self-driving?

https://benwheatley.github.io/blog/2024/03/11-12.00.16.html


1. Note time stamp: https://github.com/BenWheatley/char-rnn

2. Most ML is basic calculus and basic linear algebra — to the extent that people who don't follow it, use that fact itself as a shallow argument.

3. I'm not asserting how fast it can advance, I'm asserting that the comparison with "6 million years of evolution" is a as much a shallow hand-wave as saying it's trivial, as evidenced by what we've done so far.


I think the most damning thing about this whole saga for all of AI is how much energy and attention people are giving it.

In most established verticals, such a cartoonish scam would be dead on arrival. But apparently generative AI is still not mature enough to just move past this kind of garbage in a clean break.


To be fair, the AI industry is used to people manifesting out of nowhere doing something stupid and then ending up with revolutionary results. It's no surprise that there's a default optimism (especially if it pans out because then that makes running high quality AI stuff so much cheaper).

> I think the most damning thing about this whole saga for all of AI is how much energy and attention people are giving it.

That's because there is nothing better today, and nothing like it in the history.


I think it is damning of the people who aren't paying attention, because this stuff at this trajectory is gonna be world changing pretty soon.

It's not a cartoonish scam, and if it was, it took 48 hours to fall apart. Not worth getting the Jump to Conclusions™ mat out for.

This isn't said aggressively or to label, but rather, to provide some context that it's probably not nearly as simple as you are suggesting: this thread looks like a bunch of confused engineers linking drama threads from laymen on Twitter/Reddit to eachother, seeing pitchforks, and getting out their own. Meanwhile, the harsh conclusions they jump to are belied by A) having engineering knowledge _and_ looking into their claims B) reading TFA


You should try Azure: it comes with dedicated capacity which is typically a very expensive "call our sales team" feature with OpenAI

OpenAI just launched the equivalent of Velvet as a full fledged feature today.

But seperate from that you typically want some application specific storage of the current "conversation" in a very different format than raw request logging.


> As leaked emails later showed, even Microsoft employees used the Embrace Extend Extinguish term to describe this project.

Source?


It's like using a hammer to turn a screw and calling it useless.

To envision what a next generation model bound by the same constraints should do, it'd be to recognize that it can't count tokens and use code access to write code that solves the strawberry problem without prompting.

Asked to count cells it'd be a model that could write and execute OpenCV tasks. Or to go a step further, be a multimodal model that can synthesize 10000 varations of the target cell, and finetune a model like YOLO on it autonomously.

I find arguments that reduce LLMs to "It can't do the simple thing!!!!" come from people unable to apply lateral thinking to how a task can be solved.


> To envision what a next generation model bound by the same constraints should do, it'd be to recognize that it can't count tokens and use code access to write code that solves the strawberry problem without prompting.

The VQA problems I'm describing can be solved seemingly in one case but not combined with counting. Counting is fundamentally challenging for sort of unknown reasons, or perhaps known to the very best labs who are trying to tackle it directly.

Another POV is that the stuff you are describing is in some sense so obvious that it has been tried, no?


I don't get what you mean by "unknown reasons", we understand that counting tokens requires a type of introspection transformer models can't do while operating on tokens.

What I described is tried, and works, but the models are still not cheap/fast/reliable enough to always do what I described for every query.

The difference between what I described and directly asking the model to count is that we know the models can get cheaper, faster, and more reliable at what I described without any earth shattering discoveries

Like I don't see any reason why GPT 10 will ever be able to count how many letters there are in the word strawberry without a complete paradigm shift in model building... but going from GPT 3 to GPT 4 we already got a model that can always write the dead simple code required to count it out, and the models that can do so are already getting cheaper and faster every few months without any crazy discoveries.


> What I described is tried, and works

Maybe they should make you Chief Scientist at OpenAI, I hear the position is open.


Don't get pissy with me because you decided to talk about something you don't understand the basics of.

That's not how that term ever worked, but go off.


Is this common in Go?

Just this week I was examining moving my early stage stack to Golang, but I repeatedly came across highly recommended packages that were dead. Gorilla had apparently died but come back in some capacity, Echo seems to be actively dying as PRs aren't being addressed, Buffalo was an option I looked at and is now archived

Does the "just use the stdlib" mentality mean everyone is just rolling the glue layers on their own 1000x over?


I just started coding in Go professionally over a year now. So far, that's the pattern that I'm seeing as well. I'm not really a fan. Some common answers on why to not use a lib is because it's trivial to rollout your own.

I like Go as a language but not so much on the community because of the reasons above. I just don't want to implement cache/cron for the n-th time again. I'd rather spending more time on building a new product instead, which is not the case when I'm using Go.


There's a bunch of caching and cron libraries; you can pick one that best suit your needs, and you don't really need to implement that from scratch.


Can’t speak for buffalo but there are many libraries that have not been updated in a while and there’s a reason for that - they are complete.

There is no reason to update them, this isn’t nodejs that depends on one billion packages to do one thing where one of those changing basically affects any downstream users.

The std library is awesome, backwards compatible, and lots of libraries just add onto it. The interfaces are compatible and you can just keep your code simple.

I used to code a lot in Go. These days I’m back in node because it’s just easier for me to move faster. I’m also not doing anything with concurrency, so haven’t had a real need for Go

I think for core critical services I would use Go again just I haven’t needed to yet with my new project.


I can appreciate feature complete software but ignoring PRs and literally archiving the Github project means it's dead, not complete.


No, definitely not common. GoBuffalo had some initial design frustrations, but overall it was such a solid and well maintained project. I was really surprise to discover that the Github repo was just quietly archived. Gophers slack has a channel for buffalo, and users still can provide some help, but still.

Yeah sadly this is the case for most webframework in Go, but if you want looking for something like that check beego


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: