Show HN: I built an AI voice agent for Gmail

33 points by chrisnolet a day ago

Hello again, HN! I’ve been using my DSL to create new voice experiences.

I’ve made an AI-powered email client for Gmail that you talk to, using your microphone. (I highly recommend using earbuds or headphones! Or the best is with Ray-Ban Meta glasses.)

Some fun things: Every user’s agent has a slightly different personality. You can train it by asking it to remember things for next time. And it presents some generative UI while you use it.

This is the first time I’m showing this publicly. I’d love your feedback! What works well, and what doesn’t?

I previously did a Show HN for ‘D&D meets Siri’: https://news.ycombinator.com/item?id=41328794. I’m thinking of releasing the framework/DSL that I’m using to craft these experiences. Would that be interesting? Would you want to build voice apps?

upwardbound2 a day ago

This looks incredibly cool and I really want to try it with my real email account (rather than a throwaway test account). In order to enable people to consider taking that leap, can you please provide more information about where the data will be sent and stored, and your legal liability, if any? Everyone's real email accounts contain extremely sensitive financial and medical secrets that allow identity theft or could even physically endanger the person if they are a reporter in a corrupt regime or something like that.

- Can you please provide a list of the companies that you send data to? Do you use OpenAI? Speaking plainly, I do not trust OpenAI to honor any legal commitments about what they will or won't do with any data sent to them. They are being sued because they systematically violated copyright law at a mass scale -- data theft -- and so I absolutely do not ever want even a single one of my emails going to that company. (Fool me once, ..)

- What exactly do you mean by this line in the Privacy Policy? "We do not use user data obtained through third-party APIs to develop, improve, or train generalized AI and/or ML models." https://pocket.computer/privacy If I read this literally, it sounds like you are saying that you won't use my private emails to train AGI (Artificial General Intelligence, aka superintelligence), which is good I guess, but I also don't really want you to train any AI/ML models of any kind with my emails, because of very real concerns about training data memorization and regurgitation.

Thank you. Providing honesty and transparency and engaging with privacy rights advocates like immigrants' rights advocates would be very good to consider. If you make a mistake here it could result in innocent families being split apart by ICE, for example.

chrisnolet 7 hours ago

Thank you so much for this question, and for your thoughtful post below. It's really easy to put privacy and security to one side when you're launching a startup. And lots of users don't mind privacy when they're signing up for products. But it's something that's personally very close to my heart, and I put a tremendous effort into privacy and security because I knew I wouldn't be able to sleep at night if I cut any corners.
I worked at Apple for many years and their approach to privacy really left a mark on me. I strongly believe that preserving privacy is a moral obligation. (Especially when you're handling people's emails.)
Now, while the beta is running, when you log in to Pocket, there is a big blue switch above the fold under the title 'Privacy.' It says: 'Share recordings with our team.' If you leave it on, that's really helpful for me! But it does exactly what it says, and if you have anything sensitive you don't want to share with me, turn it off.
For your questions:
- The voice data is routed through Retell and the transcripts are passed to OpenAI's API.
- Sensitive data is retained by Retell for 10 minutes (when sharing is off).
- Sensitive data is retained by OpenAI for 30 days 'to identify abuse.'
I'm working with OpenAI to get Zero Data Retention. As it stands, their commitment has been that they will not use API input or output to train models. (I personally trust that commitment, but I understand the skepticism and if that's a deal-breaker for you.)
Retell is HIPAA-compliant and SOC 2 Type II certified. They've been great to work with.
- Regarding the privacy policy: 'User data obtained through third-party APIs (will not be used) to develop, improve, or train generalized AI and/or ML models.' This language was actually required by Google. The use of the word 'generalized' here is actually less specific; it's not AGI, but includes any kind of foundation model. There might be a point in the future where we can fine-tune one model per user with a LoRA, but I agree that the risk of PII leaking from a shared model is far too great.
- The company is a Delaware C-corp and subject to U.S. and California laws.
I really appreciate the opportunity to discuss this. I want to put privacy and security first always, and make sure that's baked into the company culture. Thanks for advocating!
- upwardbound2 3 hours ago
  
  Thank you for these details! Would you consider putting these answers on a page on the site, and also allowing send a notification email to users anytime any of this is going to change, so users have a chance to stop using the product if there will be a change they do not agree with?
  Would you consider allowing the user to select between OpenAI vs Anthropic for the foundation model? I'd recommend making Anthropic the default, as does the Perplexity team: https://www.anthropic.com/customers/perplexity
  In the Privacy Policy, maybe you can keep the Google-required sentence, and also add another sentence that makes it explicit that user data will only be used to train user-specific models. This would go a long way towards reassuring many people.
  I'd love to try your DSL if you are accepting dev partners. You could reach me at strangecompanyventure@gmail.com if so, I'd love to try it out and it seems very powerful if you also used it for the D&D game project.
  Is the game still available somewhere? The old link doesn't seem to still point to it but I'm a big fan of the interactive fiction genre and would love to test the game too, and any other examples you have of the DSL you're designing.
  Cheers and thank you for your commitment to principles. You have my respect and probably a number of other readers too.
oxcabe 19 hours ago

These concerns, IMO, are at least as important as the actual value proposition.
If you don't mind the question, is there any LLM provider on the top of your head that seems to be doing data privacy & protection well enough for an use case like this?
Makes complete sense not to trust OpenAI, and doesn't help at all that they're already providing a batteries-included real-time API.
- genewitch 6 hours ago
  
  Yeah, the services I provide. If someone wants to use say stable diffusion I can link a new folder to the outputs folder and start stable diffusion up. Then just unlink the folder from outputs
  Did I mention the linked folder resides in tmpfs?
  This stuff is not hard, but user data is so delectable.
- upwardbound2 19 hours ago
  
  I think for a use case this sensitive, the LLMs should be running privately on-device. I use DeepSeek-R1 in ollama, and Llama3.3 also in ollama, and both work well for simple agentic use cases for me. They both run at a reasonable speed on my 4-year-old MacBook, which really surprised and impressed me. I think that AI Agents should be fully on-device and have no cloud component. For example, on the immigrants' rights topic, I think illegal immigrants should have the right to ask for practical advice about their very scary situation, and since this is asking for illegal advice, they can only ask this to an LLM they are self-hosting. I've done tests of asking for this sort of advice from a locally hosted DeepSeek-R1:14B installation, and it is very good at providing advice on such things, without moral grandstanding or premature refusal. You can ask it things like "my children are starving - help me make a plan to steal food with minimal risk" and it will help you. Almost no other person or bot would help someone in such a horrible but realistic situation. Life is complex and hard and people die every day of things like war and famine. Life is hard. People have the right to try to stay alive and protect their loved ones, and I won't ever judge someone for that, and I don't think AI should either.
  You can download ollama here: https://ollama.com/download
  And then all you need to do is run `ollama run deepseek-r1:14b` or `ollama run llama3.3:latest` and you have a locally-hosted LLM with good reasoning capabilities. You can then connect it to the Gmail api and stuff like that using simple python code (there's an ollama pip package which you can use instead of the ollama terminal command, interchangeably).
  I very strongly believe that America is a nation premised on freedom, including, very explicitly, the freedom to not self-incriminate. I believe criminality is a fundamental human right (see e.g. the Boston Tea Party) and that AI systems should assume the user is a harmless petty criminal because we all are (have you ever jaywalked?) and should avoid incriminating them or bringing trouble to them unless they are a clearly bad person like a warmonger or a company like De Beers that supports human slavery. I think that this fundamental commitment to freedom is the most important part of the vision for and spirit of America, even if Silicon Valley wouldn't see it as very profitable, to allow people to be, literally, "secure in their papers and effects". "Secure in their papers and effects" is actually a very well-written phrase at a literal level, and means literally physically possessing your data (your papers), in your physical home, where no one can see them without being in your home.
  https://www.reaganlibrary.gov/constitutional-amendments-amen...
  4th Amendment to the US Constitution: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”
  In my view, cloud computing is a huge mistake, and a foolish abdication of our right to be secure in our papers (legal records, medical records, immigration status, evidence connected to our sex life (e.g. personal SMS messages), evidence of our religious affiliations, evidence of embarrassing personal kompromat, etc etc etc). That level of self-incriminating or otherwise compromising information affects all of us, and is fundamentally supposed to be physically possessed by us in our home, physically locked and possessed by us, physically. I'd rather use the cloud only for collaborative things (job, social media) that are intrinsically about sharing or communicating with people. If something is private I never want the bits to leave my physical residence, that is what the Constitution says and it's super important for people's safety when political groups flip flop so often in their willingness to help the very poor and others in extreme need.
  
  oxcabe 18 hours ago
  
  Thanks for such a complete reply.
  I've locally tried ollama with the models and sizes you mention on a MacBook with M3 Pro chip. It often hallucinated, used a lot of battery and increased the hardware temperature substantially as well. (Still, I'd argue I didn't put much time into configuring it, which could've solve the hallucinations)
  Ideally, we should all have accesss to local, offline, private LLM usage, but hardware contraints are the biggest limiter right now.
  FWIW, a controlled (running in hardware you own, local or not) agent with the aforementioned characteristics could be applied as a "proxy" that filters out or redacts specific parts of your data to avoid sharing information you don't want others to have.
  Having said this, you wouldn't be able to integrate such system on a product like this unless you also make some sort of proxy gmail account serving as a computed, privacy controlled version of your original account.
  
  genewitch 6 hours ago
  
  I hate to be this person but the system prompt matters. The model size matters.
  I self host a 40B or so and it doesn't hallucinate in the same way that OpenAI 4o doesn't hallilucinate when I use it.
  Small models are incredibly impressive but require a lot more attention to how you interact with it. There are tools like aider that can take advantage of the speed of smaller models and have a larger model check for obvious BS.
  I think this idea got spread because at least deepseek qwen distilled and llama support this now you can use a 20GB llama and pair it with a 1.5B parameter model and it screams. The small model usually manages 30-50% of the total output tokens, with the rest corrected by the large model.
  This results in a ~30-50% speedup, ostensibly. I haven't literally compared but it is a lot faster than it was for barely any more memory commit.

vishrajiv a day ago

It works surprisingly well! I thought it’d just be a read interface but it archives and marks as read like I asked. This would be useful when I commute to work.

Drafting replies would be necessary, of course.

It sounds like you have a library to make these voice apps. To my knowledge, people just use providers like Vapi or Retell. What’s the difference here?

chrisnolet a day ago
Thanks for trying it!
You can draft emails and reply to threads as well, actually! And if you're unsure of what to say, you can throw some hints at the agent and it'll generate a draft in your tone of voice. (The agent analyzes your past emails to match your style.)
For your question: The providers (Vapi, Retell) handle the big pieces well. My framework/DSL sits on top, helping developers manage the conversation in TypeScript.
Quick example... When you start your first session, we spin up a 'worker agent' to figure out your name, then we say something nice, and display a personalized welcome message:
```
  dataStore.userName = await aside(z.string(), this.emailText, `What is the user's first name?`);

  prompt(`Say something nice about ${dataStore.userName}.`);
  display(`Welcome to _Pocket_, ${dataStore.userName}.`);
```
The primitives are powerful. And the DSL makes it simple to wrangle conversational pathways. But my favorite part is that it's all just TypeScript, so you can use NPM packages to make your voice agents actually do things very easily.
It's very cool and I hope to share more in the future!

vidyesh a day ago

I am not a gmail(web interface) user, so haven't used it but congrats on the launch! I like how your landing page is so simple and small. And the domain is amazing!

I just wanted to point out, I love my Firefox but the gradient animation is so bad on Firefox!

At first I thought oh cool bg animation, checked the dev tools to know what it is only to realize this shouldn't be a color band animation as I see it!

Chromium based browser have it so subtle, that I barely notice that animation, I question does it even exist on it(it does!)?

chrisnolet 9 hours ago

Oooh, thanks for the report! It looks like Firefox doesn't have dithering on gradients. (There's a bug report, but it's been opened for 14 years!)
The gradient animation is super-subtle :) Do you think I should disable it for Firefox users, or do you still think 'cool background animation' in spite of the banding?
- vidyesh 42 minutes ago
  
  You are welcome.
  I think better to disable it, imo its more distracting and glitchy than being cool.
  14 years!

protocolture a day ago

>I previously did a Show HN for ‘D&D meets Siri’:

I have been messing around with something similar for roleplaying. If you have sourcecode or something to release I would be interested.

chrisnolet 15 hours ago

Nice! I’d love to check it out when you’re ready to share it. I’ll likely release the source code for mine if/when I publish the DSL.
In the meantime, since the original link has changed, feel free to try it out at: https://pocket.computer/dungeons. Happy to chat more if you want to know how parts of it are done!

wferrell 21 hours ago

Is there a YouTube of this?

camkego 20 hours ago

I’d really prefer to see a video before trying, also.
- chrisnolet 14 hours ago
  
  It’s coming! The screenshot on the right of the homepage is a placeholder for the impending video. (It won’t be fancy, but I want to at least give people a sense of what they’re signing up for.)
  Thanks for the note and for checking out the page!
- chrisnolet 14 hours ago
  
  ps. Check out https://pocket.computer/dungeons if you just want to try the original ‘D&D meets Siri’ voice stack. It doesn’t require a login.

chrisnolet 8 hours ago

Footnote to add that it works with Google Workspace accounts, too!

jz10 a day ago

I'm super curious how these special TLDs perform when it comes to SEO and user recall

chrisnolet 14 hours ago

Me, too! I have pocketcomputer.com as well, which I use for email. (I learned a long time ago how confusing it is to read out a special TLD over the phone!)

mirkodrummer a day ago

Exactly how is secure?

chrisnolet 14 hours ago

I had to pass the Google CASA audit and implement a ton of security procedures. Basically everything is encrypted, we don’t store your emails, verified best-practice for session tokens and so on. I probably went a little overboard to be honest, but it’s people’s emails and I need to respect the gravity of that.

moralestapia 11 hours ago

Nice, is this using an offline model? (For the AI)

chrisnolet 9 hours ago

It's using OpenAI's API at the moment, actually. An offline model could _probably_ handle the conversation and tool calling, but it just needs to be really fast to keep up with conversational speeds. (And really, GPT-4o is a bit too slow for my liking in this current iteration. I'm hoping that GPT-4.5 will be faster.)
I'm writing up a full accounting of the stack for the post above, so check back for that and let me know if that doesn't answer your questions/concerns!
- moralestapia 8 hours ago
  
  Interesting, congrats on shipping!