Home
LLM Beefer-Upper Logo

LLM Beefer-Upper

LLM Beefer Upper - Automate chain of thought with multi-agent prompt templates | Product Hunt LLM Beefer Upper Featured on topAI.tools

Sign in with Google | Switch to Light Mode

FAQ

How come the results from using this app are so good?

What a perfect first question. In short, while myriad 'prompt engineering' strategies have been proposed since GPT3 first emerged, chain of thought has been by far the most successful - see this blog referencing a 2024 New York University study: Chain of Thought Reigns Supreme: New Study Reveals the Most Effective AI Prompting Technique. Too often people see the first output from an LLM and assume that’s all it’s capable of – but it’s the equivalent of it blurting out its first answer. By adding stages for it to critique, reflect and improve, it’s like you’re giving the LLM a chance to think before it speaks. Combined with explicit prompting and a supplied knowledge base, you can see just how good the most advanced LLMs already are.

Why should I pay for that? Surely I can just do this in Chat GPT or the Claude Chat interface?

You’re technically correct, which is the best kind of correct. Please do experiment with these multi-agent strategies, it pays off that's for sure. The problem is that the process of doing that manually each time is laborious to the point most people don’t bother and just live with sub-optimal outputs. The value of this app is that it makes it quick and easy to get quality results by automating the chain of thought process with multiple agents rather than manually typing out each time - or pasting your agent follow-up templates each time if you have your own templates. 99% of people don’t even have their own prompt templates, so this app gives you the ability to create your own custom templates with up to 4 LLM agents, or use and adapt the pre-built Templates.

Who are you and why should I trust what you say?

Firstly, granting 'trust' based on credentials is a risky stance, it's always better to have an empirical, experimental mindset and test directly. But if it helps, I'm Lee Mager, currently Head of Digital Transformation Projects and Education Delivery in the LSE's Dept of Geography and Environment. I'm a 2-time winner of the LSE's 'Creative Innovator' award. I hold multiple Microsoft certifications including 'Azure AI Engineer', 'Power Platform Functional Consultant', and 'Power Platform App Maker'. I'm currently known as 'the AI guy' (I authored the LSE's guidance on using generative AI for research), but for the previous few years it was 'the automation guy'. Really I'm 'the productivity guy', and (robotic) process automation and gen AI just happen to be extremely valuable for productivity. I hold a PhD in Sociology from LSE which was centred around discourse analysis and for my MSc in Sociology at LSE, I was valedictorian, winning both the best overall performance and best dissertation prize in the Department. Ultimately, I'm just obsessed with doing things well. This app was built for myself originally, as I got such fantastic results from implementing multi-agent critique and reflection stages with LLMs, but it was a pain to do this manually each time. I use this app constantly and consider it the most useful personal productivity tool I've ever created (out of hundreds of automations I've built for myself over the years). Whether you find it useful too is your call. Feel free to connect with me on Linkedin and follow me on Twitter.

Why can’t cheaper LLM models be offered? £0.27 for a steak output is a bit steep

This app has only one goal: getting the best quality outputs possible from the best LLM available. It’s not designed for every little task, but those where high quality outputs are important. For medium quality outputs you can always choose the ribs (3 LLM agents – 3 credits or £0.16) or burger (2 LLM agents, 2 credits or £0.11) options. But for anything where quality is important, it’s worth choosing the steak option.

This whole ‘chain of thought’, ‘multi-agent prompt strategy’ stuff seems excessive. I actually find the results from Chat GPT and Claude good enough with their first responses as they are.

That’s awesome, but sometimes you don't just want 'good enough'. This app isn’t for everyday simple use cases, it’s for when the output quality really matters. As of 2nd Aug 2024, post-task completion feedback for LLM Beefer Upper shows 100% agreeing that the final agent response was better than the first response. The first response could still of course be ‘good enough’ for a particular user, but for those who want better results, this app reliably delivers.

Why can't it accept my full 50,000 word PDF like Chat GPT?

Any standard academic paper less than 10k words will work fine. The problem is that RAG (Retrieval Augmented Generation) – which is what Chat GPT uses for reading files, is extremely unreliable. As of Summer 2024, RAG is still essentially a hack that's far from solved and you should never trust what GPT says in cases where you upload huge (or multiple PDFs), it's seriously asking for trouble. This app lets you upload a PDF but the text is simply added to the prompt knowledge text itself, which is the only effective way to ensure the LLM has it in full and can answer accurately. This also means you can't work with huge knowledge bases, but that will come as context length constraints increase. I can assure you in the meantime that any service that lets you upload hundreds of thousands of words of knowledge will at best only be good for simple queries, not the kinds of complex, high value tasks that the LLm Beefer Upper is designed for.

Do I have to provide knowledge? What if I just want some new creative output based on a prompt alone

That’s absolutely fine, the knowledge text field isn’t required. But if you need outputs to be accurate, supplying the relevant knowledge text is far better than hoping the model’s training data has the information you need.

Why do you only allow Google login?

The app will add login options for Facebook, Microsoft and Apple in due course.

What data are you keeping about me?

The only data stored is your email and profile name associated with your google account, which templates you've added, how many times you've used them - and if you ever check the 'save output' box when running a template, the app will store that so you can see it in future. If in doubt, always keep that box unchecked, it's there for convenience and should only be used for non-sensitive and non-personal data.

Can I use this to write my essays?

Assuming you’re a university-level student, I can assure you that even the best LLMs working as a team can’t produce high quality academic work. You can probably get good quality ‘exam essay’ style responses that aren’t too dependent on literature or empirical data, but academic writing is (mostly) based on deep, rigorous and critical engagement with a substantial body of work along with applied research. LLMs cannot intelligently hold a dozen articles or books in their context memory and generate a high level academic paper from scratch. Where this app can help a lot though is in critiquing and improving your existing draft, which has the bonus of being 100x more ethical and valuable for your long-term learning progress than just typing ‘Oi GPT, do my essay for me yeah?’ on your phone while you’re in the pub.

Why is the character limit for prompt and knowledge so strict?

Because the amount of text accumulates for each new agent, by the time you get to agent 4, it has not only the full prompt and knowledge, but the 3 previous LLM responses too. This can risk the app failing because the max number of tokens has been breached, so the character limits have been added to prevent that from happening.

You said there’s a chance the app will fail if the max number of tokens is breached, would I still be charged in that case?

No – credits are only deducted once the final response has come in.

My final output wasn't awesome, can I get a refund?

This app helps to maximise the quality compared to the first response from an LLM. Obviously it can't guarantee high quality because it's dependent on what the LLM is capable of along with the quality of prompts. As of now, as good as the best LLMs are - especially when using chain of thought / reflection - they're still artificial and cannot compete with the best humans. But as you probably know if you're even considering an app like this, if you can save 75% of time and effort on a task with a decent but not perfect LLM output that requires minimal correction, that's a massive productivity boost. Life is short, qualitative language tasks are (often) dull, so take advantage of this tech to free up your time before the incoming superintelligent AI wipes out humanity.

Service

The LLM Quality Beefer-Upper is a tool designed to simplify and automate the process of enhancing the quality of AI-generated content using advanced language models. It allows users to create, refine and run custom multi-agent prompt templates to maximise output quality without having to manually copy and paste in a standard chat interface.

Authentication and Data Usage

Privacy and data security

Pricing

The app uses a pay-as-you-go credit-based system to ensure fair usage and flexibility:

Customer Support

For any queries or support needs, please email lee@automager.co.uk

Cancellation Policy

As we operate on a pay-as-you-go credit system, there are no subscriptions to cancel. Your account will remain active as long as you wish to use our service.

Dispute Resolution

In the unlikely event of a dispute, please contact the support email above.

Terms and Conditions of Promotions

Any promotional offers or discounts will be clearly communicated via email or on our website. Terms specific to each promotion will be provided at the time of the offer.

Security and Privacy

We take the security of your data seriously: