Prompt Caching¶
Gonna load all of the app data in and ask it questions with follow ups
%load_ext autoreload
%autoreload 2
%pip install -qU google-generativeai python-dotenv pandas anthropic
Setup¶
Load Env¶
import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time
from dotenv import load_dotenv
from utils import get_app_store_data, get_context
load_dotenv()
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
Get all of the app store data¶
# Get app store data
df = get_app_store_data()
# Get 150,000 tokens of context
app_data_str, app_df = get_context(150000, df)
System Prompt and Examples¶
system_prompt = """You are an App Store Data Analyzer. You should analyze the provided app store data and answer the user's questions.
- Only use the App Store Data provided in your context.
- Do not answer questions you are not confident in answering because the answer can't be found in the provided context.
- Think through your answer slowly, step by step before providing the final answer."""
print(app_data_str)
examples = [
"What app has the most ratings?",
"What are the features for the app 'Online Head Ball'?",
"What is the most expensive app in the 'Games' category?",
# "Which app has the longest description in the app store?",
# "What is the average rating of all free apps?",
# "Identify any app that is paid and has fewer than 100 ratings.",
# "List all apps that are categorized under 'Games' and have more than 400,000 ratings.",
# "Which app has the lowest price in the app store?",
# "Find all apps that have a title starting with the letter 'A'.",
# "What is the total number of ratings for all apps in the 'Health & Fitness' category?"
]
Gemini¶
Gemini says
The model doesn't make any distinction between cached tokens and regular input tokens. Cached content is simply a prefix to the prompt.
That means we are pretty much only able to cache the system prompt. So you wouldnt be able to cache search results or function call responses. We cant cache tools. It does make sense though, in an agent the only thing that is static is the system prompt.
# Create a cache with a 5 minute TTL
cache = caching.CachedContent.create(
model='models/gemini-1.5-flash-001',
display_name='App Store Data', # used to identify the cache
system_instruction=system_prompt,
contents=[app_data_str],
ttl=datetime.timedelta(minutes=1),
)
model = genai.GenerativeModel.from_cached_content(cached_content=cache)
def chat_gemini(query: str):
# Call the model
response = model.generate_content([(query)])
print(response.usage_metadata)
print(f"Question: {example}")
print(f"Answer: {response.text}")
print("\n======\n")
return response
for example in examples:
chat_gemini(example)
import anthropic
client = anthropic.Anthropic()
# Literal of models
from typing import Literal
models = Literal["claude-3-5-sonnet-20240620", "claude-3-haiku-20240307", "claude-3-opus-20240229"]
def chat_anthropic(query: str, model: models = "claude-3-5-sonnet-20240620"):
response = client.beta.prompt_caching.messages.create(
model=model,
max_tokens=1024,
system=[
{
"type": "text",
"text": system_prompt,
},
{
"type": "text",
"text": app_data_str,
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": query}],
)
print(response)
print(f"Question: {query}")
print(f"Answer: {response}")
print("\n======\n")
return response
for example in examples:
chat_anthropic(example)
Wow, so the rate limits are insanely low. We get 40k tokens per minute with 1M tokens PER DAY. We really cant do anything with that so I think thats where we end things. It would be interesting to try it but it doesnt look like this would even be an option for our experiment without getting a custom account.
The max tier gives us 50,000,000 tokens per day with 400k tokens per minute. Thats enough to do 2 experiments per minute.