Hacker News — vinext + Cloudflare Workers

NHacker Next

new
past
show
ask
show
jobs
submit

▲Grafeo – A fast, lean, embeddable graph database built in Rust (grafeo.dev)

154 points by 0x1997 7 hours ago | 51 comments

adsharma 6 hours ago [-]

There are 25 graph databases all going me too in the AI/LLM driven cycle.

Writing it in Rust gets visibility because of the popularity of the language on HN.

Here's why we are not doing it for LadybugDB.

Would love to explore a more gradual/incremental path.

Also focusing on just one query language: strongly typed cypher.

https://github.com/LadybugDB/ladybug/discussions/141

tadfisher 6 hours ago [-]

Is LadybugDB not one of these 25 projects?

adsharma 5 hours ago [-]

LadybugDB is backed by this tech (I didn't write it)

https://vldb.org/cidrdb/2023/kuzu-graph-database-management-...

You can judge for yourself what work has been done in the last 5 months. Many short videos here. New open source contributors who I didn't know before ramping up.

https://youtube.com/@ladybugdb

lmeyerov 2 hours ago [-]

Speaking of embeddable, we just announced cypher syntax for gfql, so the first OSS GPU cypher query engine. Typically used with scaleout DBs like databricks & splunk for analytical apps: security/fraud/event/social data analysis pipelines, ML+AI embedding & enrichment pipelines, etc. We originally built it for the compute-tier gap here to help Graphistry users making embeddable interactive GPU graph viz apps and dashboards and not wanting to add an external graph DB phase into their interactive analytics flows.

Single GPU can do 1B+ edges/s, no need for a DB install, and can work straight on your dataframes / apache arrow / parquet: https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...

We took a multilayer approach to the GPU & vectorization acceleration, including a more parallelism-friendly core algorithm. This makes fancy features pay-as-you-go vs dragging everything down as in most columnar engines that are appearing. Our vectorized core conforms to over half of TCK already, and we are working to add trickier bits on different layers now that flow is established.

The core GFQL engine has been in production for a year or two now with a lot of analyst teams around the world (NATO, banks, US gov, ...) because it is part of Graphistry. The open-source cypher support is us starting to make it easy for others to directly use as well, including LLMs :)

Aurornis 6 hours ago [-]

Does anyone have any experience with this DB? Or context about where it came from?

From the commit history it's obvious that this is an AI coded project. It was started a few months ago, 99% of commits are from 1 contributor, and that 1 contributor has some times committed 100,000 lines of code per week. (EDIT: 200,000 lines of code in the first week)

I'm not anti-LLM, but I've done enough AI coding to know that one person submitting 100,000 lines of code a week is not doing deep thought and review on the AI output. I also know from experience that letting AI code the majority of a complex project leads to something very fragile, overly complicated, and not well thought out. I've been burned enough times by investigating projects that turned out to be AI slop with polished landing pages. In some cases the claimed benchmarks were improperly run or just hallucinated by the AI.

So is anyone actually using this? Or is this someone's personal experiment in building a resume portfolio project by letting AI run against a problem for a few months?

jandrewrogers 6 hours ago [-]

That is a lot of code for what appears to be a vanilla graph database with a conventional architecture. The thing I would be cautious about is that graph database engines in particular are known for hiding many sharp edges without a lot of subtle and sophisticated design. It isn't obvious that the necessary level of attention to detail has been paid here.

adsharma 5 hours ago [-]

Are you talking about Andy Pavlo bet here?

https://news.ycombinator.com/item?id=29737326

Kuzu folks took some of these discussions and implemented them. SIP, ASP joins, factorized joins and WCOJ.

Internally it's structured very similar to DuckDB, except for the differences noted above.

DuckDB 1.5 implemented sideways information passing (SIP). And LadybugDB is bringing in support for DuckDB node tables.

So the idea that graph databases have shaky internals stems primarily from pre 2021 incumbents.

4 more years to go to 2030!

jandrewrogers 4 hours ago [-]

I wasn't referring to the Pavlo bet but I would make the same one! Poor algorithm and architecture scalability is a serious bottleneck. I was part of a research program working on the fundamental computer science of high-scale graph databases ~15 years ago. Even back then we could show that the architectures you mention couldn't scale even in theory. Just about everyone has been re-hashing the same basic design for decades.

As I like to point out, for two decades DARPA has offered to pay many millions of dollars to anyone who can demonstrate a graph database that can handle a sparse trillion-edge graph. That data model easily fits on a single machine. No one has been able to claim the money.

Inexplicably, major advances in this area 15-20 years ago under the auspices of government programs never bled into the academic literature even though it materially improved the situation. (This case is the best example I've seen of obviously valuable advanced research that became lost for mundane reasons, which is pretty wild if you think about it.)

rossjudson 24 minutes ago [-]

I guess it all depends on the meaning of the word "handle", and what the use cases are.

adsharma 2 hours ago [-]

> many millions of dollars to anyone who can demonstrate a graph database that can handle a sparse trillion-edge graph.

I wonder why no one has claimed it. It's possible to compress large graphs to 1 byte per edge via Graph reordering techniques. So a trillion scale graph becomes 1TB, which can fit into high end machines.

Obviously it won't handle high write rates and mutations well. But with Apache Arrow based compression, it's certainly possible to handle read-only and read-mostly graphs.

Also the single machine constraint feels artificial. For any columnar database written in the last 5 years, implementing object store support is tablestakes.

adsharma 5 hours ago [-]

Source: https://www.theregister.com/2023/03/08/great_graph_debate_we...

> There are some additional optimizations that are specific to graphs that a relational DBMS needs to incorporate: [...]

This is essentially what Kuzu implemented and DuckDB tried to implement (DuckPGQ), without touching relational storage.

The jury is out on which one is a better approach.

justonceokay 5 hours ago [-]

Yes a graph database will happily lead you down a n^3 (or worse!) path when trying to query for a single relation if you are not wise about your indexes, etc.

cluckindan 4 hours ago [-]

That sounds like a ”graph” DB which implements edges as separate tables, like building a graph in a standard SQL RDB.

If you wish to avoid that particular caveat, look for a graph DB which materializes edges within vertices/nodes. The obvious caveat there is that the edges are not normalized, which may or may not be an issue for your particulat application.

adsharma 5 hours ago [-]

Are you talking about the query plan for scanning the rel table? Kuzu used a hash index and a join.

Trying to make it optional.

Try

explain match (a)-[b]->(c) return a.rowid, b.rowid, c.rowid;

stult 1 hours ago [-]

It certainly does seem problematic to have a graph database hiding edges, sharp or not

gdotv 6 hours ago [-]

Agreed, there's been a literal explosion in the last 3 months of new graph databases coded from scratch, clearly largely LLM assisted. I'm having to keep track of the industry quite a bit to decide what to add support for on https://gdotv.com and frankly these days it's getting tedious.

piyh 4 hours ago [-]

I'm turning off my brain and using neo4j

UltraSane 3 hours ago [-]

Neo4j is pretty nice.

aorth 3 hours ago [-]

Figurative!

ozgrakkurt 4 hours ago [-]

Using a LLM coded database sounds like hell considering even major databases can have some rough edges and be painful to use.

hrmtst93837 2 hours ago [-]

Six figures a week is a giant red flag. That kind of commit log usually means codegen slop or bulk reformatting, and even if some of it works I wouldn't trust the design, test coverage, or long-term maintenance story enough to put that DB anywhere near prod.

arthurjean 5 hours ago [-]

Sounds about right for someone who ships fast and iterates. 54 days for a v0 that probably needs refactoring isn't that crazy if the dev has a real DB background. We've all seen open source projects drag on for 3 years without shipping anything, that's not necessarily better

Aurornis 2 hours ago [-]

200,000 lines of code on week 1 is not a sign of a quality codebase with careful thought put into it.

> We've all seen open source projects drag on for 3 years without shipping anything, that's not necessarily better

There are more options than “never ship anything” and “use AI to slip 200,000 lines of code into a codebase”

TheJord 44 minutes ago [-]

shipping fast matters a lot less than shipping something you actually understand. 200k lines in a week means nobody knows what's in there, including the author. that's not a codebase, it's a liability

brunoborges 34 minutes ago [-]

Why is everything "... built in Rust" trending so easily on HN?

IshKebab 22 minutes ago [-]

Because Rust is an excellent language that pushes you into the "pit of success", and consequently software written in Rust tends to be fast, robust and easy to deploy.

There's no big mystery. No conspiracy or organised evangelism. Rust is just really good.

satvikpendem 6 hours ago [-]

There seem to be a lot of these, how does it compare to Helix DB for example? Also, why would you ever want to query a database with GraphQL, for which it was explicitly not made for that purpose?

natdempk 3 hours ago [-]

Serious question: are there any actually good and useful graph databases that people would trust in production at reasonable scale and are available as a vendor or as open source? eg. not Meta's TAO

gdotv 4 minutes ago [-]

plenty of those - I've had to work with dozens of different graph databases integrating them on https://gdotv.com, save for maybe 1-2 exceptions in the list of supported databases on our website, they're all production ready and either backed by a vendor or open-source (or sometimes both, e.g. Apache AGE for Azure PostgreSQL). There are some technologies that have been around for a long time but really flying under the radar, despite being used a lot in enterprise (e.g. JanusGraph).

cjlm 3 hours ago [-]

Serious answer: limiting to just Open Source: JanusGraph, DGraph, Apache AGE, HugeGraph, MemGraph and ArcadeDB all meet that criteria.

adsharma 2 hours ago [-]

What is open source and what is a graph database are both hotly debated topics.

Author of ArcadeDB critiques many nominally open source licenses here:

https://www.linkedin.com/posts/garulli_why-arcadedb-will-nev...

What is a graph database is also relevant:

  - Does it need index free adjacency?
  - Does it need to implement compressed sparse rows?
  - Does it need to implement ACID?
  - Does translating Cypher to SQL count as a graph database?

szarnyasg 2 hours ago [-]

That's a difficult question and I would like to avoid giving a direct answer (because I co-lead a nonprofit benchmarking graph databases) but even knowing what you need for a graph database can be a tricky decision. See my FOSDEM 2025 talk, where I tried to make sense of the field:

https://archive.fosdem.org/2025/schedule/event/fosdem-2025-5...

adsharma 2 hours ago [-]

What people perceive as "Facebook production graph" is not just TAO. There is an ecosystem around it and I wrote one piece of it.

Full history here: https://www.linkedin.com/pulse/brief-history-graphs-facebook...

pphysch 3 hours ago [-]

Yeah: Postgres, etc.

When you actually need to run graph algorithms against your relational data, you export the subset of that data into something like Grafeo (embedded mode is a big plus here) and run your analysis.

adsharma 2 hours ago [-]

That importing is expensive and prevents you from handling billion scale graphs.

It's possible to run cypher against duckdb (soon postgres as well via duckdb's postgres extension) without having to import anything. That's a game changer when everything is in the same process.

mark_l_watson 2 hours ago [-]

I just spent an hour with Grafeo, trying to also get the associated library grafeo_langchain working with a local Ollama model. Mixed results. I really like the Python Kuzu graph database, still use it even though the developers no longer support it.

cjlm 3 hours ago [-]

Overwhelmed by the sheer number of graph databases? I released a new site this week that lists and categorises them. https://gdb-engines.com

dbacar 3 hours ago [-]

Did you generate the list using an LLM?

cjlm 1 hours ago [-]

I was inspired by https://arxiv.org/abs/2505.24758 and collated their assessment into a table and then just kept adding databases :)

Claude helped a lot but it's all reviewed and curated by me.

xlii 1 hours ago [-]

I wonder if people are using (or intend to use) vibe-coded projects like the one linked.

I mean - I understand, some people have fun looking at new tech no matter the source, but my question is is there a person who would be designated to pick a GraphQL in language and would ignore all the LLM flags and put it in production.

cluckindan 4 hours ago [-]

The d:Document syntax looks so happy!

3 hours ago [-]

OtomotO 4 hours ago [-]

Interesting... Need to check how this differs from agdb, with which I had some success for a sideproject in the past.

https://github.com/agnesoft/agdb

Ah, yeah, a different query language.

nexxuz 4 hours ago [-]

I was ready to learn more about this but I saw "written in Rust" and I literally rolled my eyes and said never mind.

ComputerGuru 4 hours ago [-]

I think "written by genAI" should be a bigger turnoff than "written in Rust".

andriy_koval 3 hours ago [-]

alternative opinion:

* it is possible to write high quality software using GenAI

* not using GenAI could mean project won't be competitive in current landscape

quantumHazer 3 hours ago [-]

> not using GenAI could mean project won't be competitive in current landscape

why? this is false in my opinion, iterating fast is not a good indicator of quality nor competitiveness

andriy_koval 35 minutes ago [-]

iterating fast over quality (e.g. refactoring, tests coverage, benchmarks, documentation, trying new nontrivial ideas) is a good indicator of quality.

Aurornis 2 hours ago [-]

> * it is possible to write high quality software using GenAI

From examine this codebase it doesn’t appear to be written carefully with AI.

It looks like code that was promoted into existence as fast as possible.

andriy_koval 26 minutes ago [-]

sure, there are bad genAI projects and there are good genAI projects. You can remove genAI term from previous sentence.

chuckadams 3 hours ago [-]

Too bad you don't do the same for commenting on HN.

measurablefunc 5 hours ago [-]

This looks like another avant-garde "art" project.

takahitoyoneda 3 hours ago [-]

[dead]

aplomb1026 5 hours ago [-]

[dead]

Rendered at 22:07:39 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.