Blog

February 23, 2025
in Personal Development, Productivity, Behavioral Change
2 min read

Do First, Optimize Later: Breaking the Cycle of Over-Optimization

I've come to a realization: I spend too much time planning and optimizing rather than actually doing. AI and automation have fueled my obsession with optimization, making me believe that if I refine a system enough, I’ll be more productive. But the truth is, optimization is only valuable when applied to something that already exists.

The problem is, I often optimize before I start. I think, “I need to make a to-do list,” but instead of actually making one and using it, I get lost in finding the best way to structure a to-do list, the best app, or the best workflow. Even right now, instead of writing down what I need to do, I’m writing a blog post about how I should be writing things down. This is the exact loop I need to escape.

Optimization feels like progress. It gives me the illusion that I’m working towards something, but in reality, I’m just postponing action. The efficiency of a to-do list doesn’t matter if I’m not using one. The best UX for adding tasks doesn’t matter if I never add tasks. The friction in a system isn’t relevant if I’m not engaging with the system at all.

The real issue isn't inefficiency—it's a lack of discipline. I tell myself I'm not doing things because the process isn't optimized enough, but the truth is simpler: I just haven't done them. My focus should be on building the habit of doing, not perfecting the process before I even begin.

November 10, 2024
in AI Applications, Personal Finance, Technology Experiments
9 min read

Handing Over My Wallet to AI: Which Model Gave the Best Financial Advice?

AI Financial Advisors

Ever looked at your bank account and thought "I should probably talk to a personal financial advisor" — but then remembered that good advisors charge anywhere from $150 to $300 per hour? For most of us, professional financial advice feels like a luxury we can't justify. But we shouldn't have to wait until we're rich to get good financial advice!

That's where AI might change everything. Instead of paying hundreds per hour for financial advice, what if you could get personalized insights for the cost of a ChatGPT Plus subscription? To test this possibility, I connected RocketMoney to all my accounts—checking, credit cards, investments, the works—and exported 90 days of transaction data. Then I fed this financial snapshot to three AI heavyweights: ChatGPT, Claude, and Gemini.

But this isn't just about which AI is "smarter." Each platform brings different tools and features that the model can use to analyze the data. I asked each to analyze my spending and create a comprehensive financial plans and reports, just like a human advisor would.

I kept it simple. Each AI received the same prompt:

You are an expert personal finance manager and wealth advisor. I have included my last 90 days of transactions. I need you to do an analysis of my current financial situation and give me a report and wealth plan. Keep in mind this csv is a consolidation for all of my accounts and includes transfers and credit card payments provided by RocketMoney.

The results? Let's just say one AI saved me more money in potential insights than a year's worth of its subscription costs—while another couldn't even handle the basics. Here's what happened when I turned my finances over to the machines.

August 30, 2024
in LLMs, Evals, Gemini, OpenAI, LangChain
7 min read

Should You Even Trust Gemini’s Million-Token Context Window?

Haystack Made with GPT-4o

📖 Read On Medium

Imagine you’re tasked with analyzing your company’s entire database — millions of customer interactions, years of financial data, and countless product reviews — to extract meaningful insights. You turn to AI for help. You shove all of the data into Google Gemini 1.5, with its new 1 million token context length and start making requests, which it seems to be solving. But a nagging question persists: Can you trust the AI to accurately process and understand all of this information? How confident can you be in its analysis when it’s dealing with such a vast amount of data? Are you going to have to dig through a million tokens worth of data to validate each answer?

Traditional AI tests, like the well-known “needle-in-a-haystack” tests, fall short in truly assessing an AI’s ability to reason across large, cohesive bodies of information. These tests often involve hiding unrelated information (needles) in an otherwise homogeneous context (haystack). The problem is that it makes the focus on information retrieval and anomaly detection rather than comprehensive understanding and synthesis. Our goal wasn’t just to see if it could find a needle in a haystack, but to evaluate if it could understand the entire haystack itself.

Using a real-world dataset of App Store information, we systematically tested Gemini 1.5 Flash across increasing context lengths. We asked it to compare app prices, recall specific privacy policy details, and evaluate app ratings — tasks that required both information retrieval and reasoning capabilities. For our evaluation platform, we used LangSmith by LangChain, which proved to be an invaluable tool in this experiment.

The results were nothing short of amazing! Lets dive in.