Skip to content

Conversation

@bupt-lmy
Copy link
Contributor

Summary

This PR brings the Java SDK Graph Store to Python-parity for OceanBase, including a dedicated graph_store.llm/* and graph_store.embedder/* override path, plus tests and docs updates.

What’s included

  • Python-parity OceanBaseGraphStore

    • Implements entities + relationships schema (graph_entities / graph_relationships)
    • LLM tools extraction for entities/relations/deletions with robust parsing + fallbacks:
      • extract_entities
      • establish_relationships
      • delete_graph_memory
    • ANN-style entity resolution
      • Best-effort VECTOR(dims) column + CREATE VECTOR INDEX (HNSW/IVF)
      • Fallback to embedding_json brute-force when VECTOR is unavailable
    • Graph retrieval
      • Multi-hop traversal (id-based frontier expansion with cycle prevention behavior)
      • BM25 reranking with lightweight tokenizer (offline-friendly)
  • GraphStore config parity

    • Added nested overrides:
      • graph_store.llm.* (provider/apiKey/model/baseUrl/temperature/maxTokens/topP)
      • graph_store.embedder.* (provider/apiKey/model/baseUrl/dims)
    • ConfigLoader supports env keys:
      • GRAPH_STORE_LLM_*, GRAPH_STORE_EMBEDDING_* (with embedder aliases)
  • Core response parity

    • When GraphStore is enabled, relations is always returned for:
      • add, search, get_all (empty allowed; no null omission)
  • Tests

    • Offline unit test for nested config parsing: GraphStoreConfigOverrideTest
    • Updated integration test usage patterns and assertions:
      • OceanBaseGraphStoreIT (env-gated)
  • Docs

    • README structure updated and GraphStore section refreshed
    • Added GraphStore-specific .env examples for graph_store.llm/embedder overrides

Why

  • Match Python powermem Graph Store behavior so benchmark/server output and graph workflows are consistent across languages.
  • Enable running Graph Store with a dedicated model setup (e.g., Graph-specific Qwen LLM/embedding) without affecting Memory-level providers.
  • Provide deterministic offline testing and safe, env-gated OceanBase integration tests.

Test plan

  • Offline unit tests:
    • mvn test
  • OceanBase integration tests (env-gated):
    • mvn -Dtest=OceanBaseGraphStoreIT test

Notes / Compatibility

  • Vector index creation is best-effort and automatically falls back to brute-force when VECTOR/index features are not available in the target OceanBase cluster/version.
  • BM25 tokenizer is a lightweight implementation (offline-friendly); ranking may differ slightly from Python’s jieba + rank_bm25 in some edge cases.

Implement OceanBase Graph Store with entities+relationships schema, LLM-tools extraction,
ANN-style entity matching with best-effort VECTOR index + fallback, BM25 rerank and multi-hop traversal.
Add graph_store.llm/embedder override config support, update relations output parity, and add tests/docs.
Use readme_EN as readme
@@ -0,0 +1,233 @@
package com.oceanbase.powermem.sdk.storage.memory;

import com.oceanbase.powermem.sdk.storage.base.GraphStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件存在是要解决什么问题?和python sdk 貌似不匹配?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphStore 的内存版实现,主要用于离线/单元测试,可以选择去除或保留

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants