-
Notifications
You must be signed in to change notification settings - Fork 8
Colelction: making the module provider agnostic #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a provider abstraction and registry, migrates Collection schema to include provider/name/description and remove organization_id, updates create/delete flows to use providers, enforces per-project unique collection names, and updates tests/utilities to the provider-driven flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as API Client
participant Route as Collections Route
participant Registry as Provider Registry
participant Provider as Provider Impl
participant Storage as Cloud Storage
participant DB as Database
Client->>Route: POST /collections (name, documents, provider)
Route->>Route: ensure_unique_name(project_id, name)
Route->>DB: enqueue create job / persist job record
Route->>Registry: get_llm_provider(provider, project_id, org_id)
Registry->>DB: fetch provider credentials
Registry->>Provider: instantiate provider
Route->>Provider: provider.create(request, storage, document_crud)
Provider->>Provider: batch_documents()
Provider->>Storage: create/upload vector store
alt with assistant
Provider->>Storage: create assistant
end
Provider-->>Route: return llm_service_id & llm_service_name
Route->>DB: persist Collection (provider, name, description, ids)
Route-->>Client: send success callback / job completion
sequenceDiagram
participant Client as API Client
participant Route as Collections Route
participant Registry as Provider Registry
participant Provider as Provider Impl
participant Storage as Cloud Storage
participant DB as Database
Client->>Route: DELETE /collections/{id}
Route->>DB: fetch Collection (provider, llm_service_name, ids)
Route->>DB: enqueue delete job
Route->>Registry: get_llm_provider(collection.provider, project_id, org_id)
Registry->>DB: fetch provider credentials
Registry->>Provider: instantiate provider
Route->>Provider: provider.delete(collection)
alt llm_service_name indicates assistant
Provider->>Storage: delete assistant
else
Provider->>Storage: delete vector store
end
Provider-->>Route: deletion success
Route->>DB: remove Collection record
Route-->>Client: send success callback
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/tests/api/routes/test_assistants.py (1)
1-10: Remove the duplicatepatchimport.
Line 9 redefinespatchalready imported on Line 1.🧹 Proposed fix
-from unittest.mock import patch from typing import Any from uuid import uuid4 @@ -from unittest.mock import patch from app.tests.utils.llm_provider import mock_openai_assistantbackend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
18-24: Double creation of collection will cause errors.
get_assistant_collectionalready persists the collection to the database (per the relevant snippet, it callsCollectionCrud(db, project.id).create(collection)). Callingcrud.create(collection, documents)on line 24 attempts to create the same collection again, which will likely cause a primary key constraint violation or duplicate entries.🐛 Proposed fix: use the returned collection without re-creating
def create_collections(db: Session, n: int) -> Collection: crud = None project = get_project(db) openai_mock = OpenAIMock() with openai_mock.router: client = OpenAI(api_key="sk-test-key") for _ in range(n): collection = get_assistant_collection(db, project=project) store = DocumentStore(db, project_id=collection.project_id) documents = store.fill(1) if crud is None: crud = CollectionCrud(db, collection.project_id) - crud.create(collection, documents) + # Collection is already created by get_assistant_collection + # If documents need to be associated, use a different method + # or modify get_assistant_collection to accept documents return crud.project_idbackend/app/tests/services/collections/test_delete_collection.py (1)
1-12: Critical: MissingSessionimport causing pipeline failure.The pipeline failure indicates
Sessionis not defined. The test functions usedb: Sessiontype hints butSessionis not imported.Proposed fix
from typing import Any from unittest.mock import patch, MagicMock from uuid import uuid4, UUID +from sqlmodel import Session from app.models.collection import DeletionRequest - from app.tests.utils.utils import get_project
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py`:
- Around line 31-40: The migration adds a NOT NULL column "provider" via
op.add_column using provider_type which will fail on existing rows; change the
migration to add the column as nullable (set nullable=True) or with a
server_default, then run the UPDATE (op.execute("UPDATE collection SET provider
= 'OPENAI' WHERE provider IS NULL")), and finally call
op.alter_column("collection", "provider", existing_type=provider_type,
nullable=False) to enforce NOT NULL; reference the op.add_column call that uses
provider_type and the op.execute update and add the op.alter_column step (or
alternatively add server_default in op.add_column) to fix the ordering and avoid
migration failures.
In `@backend/app/models/collection.py`:
- Around line 31-37: The provider field on the Collection model is incorrectly
defined as a one-element tuple and uses a set for sa_column_kwargs; update the
ProviderType Field declaration (the provider attribute) to return just
Field(...) (remove the trailing comma and surrounding parentheses so it's not a
tuple) and change sa_column_kwargs from {"LLM provider used for this
collection"} to {"comment": "LLM provider used for this collection"} so
SQLAlchemy gets a proper dict comment.
In `@backend/app/services/collections/create_collection.py`:
- Around line 158-161: Replace the explicit equality check against True with a
direct truthiness test: locate the conditional using with_assistant (the if
statement that currently reads with_assistant == True) in create_collection
(affecting creation_request.provider assignment) and change it to use if
with_assistant: (removing the unnecessary parentheses/equality comparison) so
the block sets creation_request.provider = "openai" when with_assistant is
truthy.
In `@backend/app/services/collections/providers/base.py`:
- Around line 22-28: The __init__ method in the provider base class is missing a
return type annotation; update the constructor signature (def __init__(self,
client: Any)) to include an explicit return type of None (def __init__(self,
client: Any) -> None:) so it follows the project's type-hinting guidelines while
leaving the client parameter and assignment (self.client = client) unchanged.
In `@backend/app/services/collections/providers/openai.py`:
- Around line 44-47: The log message prefixes in OpenAIProvider's create method
are incorrect—replace instances of "[OpenAIProvider.execute]" with
"[OpenAIProvider.create]" so the log prefix matches the method name; update all
logger calls inside the create method (the logger.info/error lines that include
vector_store_id, batches, and any other similar messages) to use the correct
"[OpenAIProvider.create]" tag.
- Around line 120-124: The cleanup method is calling _backout with the wrong
arguments; instead of passing result.llm_service_id and result.llm_service_name
it must instantiate the same CRUD class used in delete (based on
result.llm_service_name) and pass that CRUD instance plus the llm_service_id to
_backout; update cleanup to mirror delete’s pattern: determine which CRUD to
construct (e.g., OpenAIAssistantCRUD, PineconeVectorStoreCRUD, etc.) using
result.llm_service_name, create the CRUD instance, then call _backout(crud,
result.llm_service_id).
In
`@backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py`:
- Around line 15-20: The test creates an unused OpenAI instance assigned to the
variable client (OpenAI(api_key="sk-test-key")) inside the openai_mock.router
context; remove that unused assignment or, if the OpenAI constructor must run
for its side effects, change the assignment to _ (i.e., assign to a throwaway
variable) so client is not defined but the constructor still runs; update the
code around the OpenAIMock context where client is currently created.
In `@backend/app/tests/utils/collection.py`:
- Around line 69-75: The Collection creation hardcodes llm_service_name via
get_service_name("openai") which mismatches the dynamic provider argument;
update the llm_service_name assignment in the Collection constructor to derive
the service name from the provider parameter (e.g., call get_service_name with
provider or a normalized form of provider) so llm_service_name aligns with the
provider value used for provider and llm_service_id.
In `@backend/app/tests/utils/llm_provider.py`:
- Around line 22-26: The function mock_openai_assistant currently uses a mutable
default for vector_store_ids which can lead to surprising statefulness; change
the signature to use vector_store_ids: Optional[list[str]] = None and inside
mock_openai_assistant initialize vector_store_ids = ["vs_1", "vs_2"] if
vector_store_ids is None (preserving the Optional[list[str]] type hint), so
callers get a fresh list each invocation and you avoid mutating a module-level
default.
🧹 Nitpick comments (13)
backend/app/services/collections/providers/base.py (1)
37-51: Aligncreate/delete/cleanupdocstrings with the actual signature/return.
The current docs mention args (batch_size,with_assistant,assistant_options) and returns (llm_service_id,llm_service_name) that don’t exist in the signature and can mislead implementers.📝 Suggested docstring cleanup
- Args: - collection_request: Collection parameters (name, description, document list, etc.) - storage: Cloud storage instance for file access - document_crud: DocumentCrud instance for fetching documents - batch_size: Number of documents to process per batch - with_assistant: Whether to create an assistant/agent - assistant_options: Options for assistant creation (provider-specific) - - Returns: - llm_service_id: ID of the resource to delete - llm_service_name: Name of the service (determines resource type) + Args: + collection_request: Collection parameters (name, description, document list, etc.) + storage: Cloud storage instance for file access + document_crud: DocumentCrud instance for fetching documents + + Returns: + Collection created by the providerbackend/app/tests/crud/collections/collection/test_crud_collection_create.py (1)
18-24: PreferProviderType.OPENAIover a raw string.Using the enum keeps values consistent with the model definition and avoids casing drift.
♻️ Suggested tweak
-from app.models import DocumentCollection, Collection +from app.models import DocumentCollection, Collection, ProviderType @@ - provider="OPENAI", + provider=ProviderType.OPENAI,backend/app/services/collections/providers/registry.py (1)
44-71: Unreachable else branch:LLMProvider.get()already raises for unsupported providers.The else branch (lines 65-69) can never execute because
LLMProvider.get(provider)on line 47 already raises aValueErrorfor any unsupported provider. This is dead code.Consider removing the else branch or restructuring the logic to avoid the redundant check:
♻️ Proposed refactor
def get_llm_provider( session: Session, provider: str, project_id: int, organization_id: int ) -> BaseProvider: provider_class = LLMProvider.get(provider) credentials = get_provider_credential( session=session, provider=provider, project_id=project_id, org_id=organization_id, ) if not credentials: raise ValueError( f"Credentials for provider '{provider}' not configured for this project." ) if provider == LLMProvider.OPENAI: if "api_key" not in credentials: raise ValueError("OpenAI credentials not configured for this project.") client = OpenAI(api_key=credentials["api_key"]) - else: - logger.error( - f"[get_llm_provider] Unsupported provider type requested: {provider}" - ) - raise ValueError(f"Provider '{provider}' is not supported.") + else: + # Future providers will be handled here + raise NotImplementedError(f"Client creation for '{provider}' not implemented.") return provider_class(client=client)Alternatively, when adding new providers, this pattern ensures extensibility while keeping the else branch meaningful.
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
12-14: Add proper type hints for function parameters.As per coding guidelines, all function parameters should have type hints. The
clientparameter lacks a type hint, andproject_idshould beOptional[int]since it can beNone.♻️ Proposed fix
+from typing import Optional + -def get_assistant_collection_for_delete( - db: Session, client=None, project_id: int = None -) -> Collection: +def get_assistant_collection_for_delete( + db: Session, client: Optional[OpenAI] = None, project_id: Optional[int] = None +) -> Collection:backend/app/services/collections/helpers.py (2)
20-27: Consider raising an error or logging for unknown providers.
get_service_namesilently returns an empty string for unknown providers. This could lead to subtle bugs where an invalid provider name goes unnoticed. Consider either raising an error or logging a warning.♻️ Suggested improvement
def get_service_name(provider: str) -> str: """Get the collection service name for a provider.""" names = { "openai": "openai vector store", # "bedrock": "bedrock knowledge base", # "gemini": "gemini file search store", } - return names.get(provider.lower(), "") + name = names.get(provider.lower()) + if name is None: + logger.warning(f"[get_service_name] Unknown provider: {provider}") + return "" + return name
116-132: Consider usingSessiontype instead ofSessionDepfor helper functions.
SessionDepis typically a FastAPI dependency type alias (e.g.,Annotated[Session, Depends(get_db)]) meant for route handlers. For a standalone helper function called from various places, using the plainSessiontype is more appropriate and avoids coupling to FastAPI's dependency injection.♻️ Proposed fix
+from sqlmodel import Session + def ensure_unique_name( - session: SessionDep, + session: Session, project_id: int, requested_name: str, ) -> str:backend/app/alembic/versions/041_extend_collection_table_for_provider_.py (1)
67-67: Use explicit constraint name for unique constraint.Using
Noneas the constraint name causes Alembic to auto-generate a name, which may differ across environments and complicate future migrations. The downgrade at line 98 already expectscollection_name_key- ensure consistency.♻️ Proposed fix
- op.create_unique_constraint(None, "collection", ["name"]) + op.create_unique_constraint("collection_name_key", "collection", ["name"])backend/app/tests/services/collections/test_create_collection.py (2)
117-119: Add return type annotation to test function.Per coding guidelines, all functions should have type hints. The
dbparameter also needs a type hint.Proposed fix
def test_execute_job_success_flow_updates_job_and_creates_collection( - mock_get_llm_provider, db: Session -): + mock_get_llm_provider: MagicMock, db: Session +) -> None:
193-195: Add type hints to test function parameters.Per coding guidelines, add type hints to all function parameters and return values.
Proposed fix
def test_execute_job_assistant_create_failure_marks_failed_and_deletes_collection( - mock_get_llm_provider, db -): + mock_get_llm_provider: MagicMock, db: Session +) -> None:backend/app/tests/utils/llm_provider.py (1)
176-179: Add return type annotation.Per coding guidelines, add return type hints to all functions.
Proposed fix
def get_mock_provider( llm_service_id: str = "mock_service_id", llm_service_name: str = "mock_service_name", -): +) -> MagicMock:backend/app/services/collections/create_collection.py (1)
142-142: Unusedtask_instanceparameter.The
task_instanceparameter is flagged as unused by static analysis. If this is required for the Celery task interface signature, consider adding a comment to clarify this, or prefixing with underscore (_task_instance) to indicate intentional non-use.backend/app/tests/services/collections/test_delete_collection.py (1)
74-77: Add type hints to test function parameters.Per coding guidelines, add type hints to all function parameters.
Proposed fix
`@patch`("app.services.collections.delete_collection.get_llm_provider") def test_execute_job_delete_success_updates_job_and_calls_delete( - mock_get_llm_provider, db -): + mock_get_llm_provider: MagicMock, db: Session +) -> None:backend/app/models/collection.py (1)
171-177: AlignProviderOptionswithProviderTypeenum for consistency.The
providerfield inProviderOptionsusesLiteral["openai"]while theCollectionmodel uses theProviderTypeenum. SinceProviderTypeis already defined for this purpose and theCollectionmodel's description anticipates multiple providers ("openai", "bedrock", "gemini", etc.), useProviderTypeinstead ofLiteralto maintain consistency and prepare for future provider support.
backend/app/alembic/versions/041_extend_collection_table_for_provider_.py
Outdated
Show resolved
Hide resolved
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/tests/services/collections/test_delete_collection.py`:
- Around line 73-76: Update the test function signatures to include typing:
annotate the patched mock parameter as MagicMock, the db parameter as Session,
and add an explicit return type -> None for the listed tests
(test_execute_job_delete_success_updates_job_and_calls_delete,
test_execute_job_delete_failure_marks_job_failed,
test_execute_job_delete_success_with_callback_sends_success_payload,
test_execute_job_delete_remote_failure_with_callback_sends_failure_payload);
import typing symbols if missing (from unittest.mock import MagicMock and from
sqlalchemy.orm import Session) and apply the annotations to the mock and db
parameters and the function return type for each test.
backend/app/tests/services/collections/test_delete_collection.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/services/collections/helpers.py (1)
83-102: Add missing type hints and fix function name typoThe function
pick_service_for_documennthas a typo in the name and is missing type hint annotations forsession,a_crud, andv_crudparameters, as well as a return type annotation. Per coding guidelines, all function parameters and return values require type hints.Required changes
-def pick_service_for_documennt(session, doc_id: UUID, a_crud, v_crud): +def pick_service_for_document(session: SessionDep, doc_id: UUID, a_crud: OpenAIAssistantCrud, v_crud: OpenAIVectorStoreCrud) -> Union[OpenAIAssistantCrud, OpenAIVectorStoreCrud]:Update imports in
backend/app/api/routes/documents.py:
- Change import from
pick_service_for_documennttopick_service_for_document(line 31)- Change both function calls from
pick_service_for_documennttopick_service_for_document(lines 190, 223)backend/app/models/collection.py (2)
87-105: Add explicit return type tomodel_post_init.Return type is missing, which violates the repo’s type-hint requirement.
🔧 Suggested fix
- def model_post_init(self, __context: Any): + def model_post_init(self, __context: Any) -> None: self.documents = list(set(self.documents))As per coding guidelines, please add explicit return types.
179-188: Add return type (and a concrete class type) forextract_super_type.This method yields tuples but has no return type hint.
🔧 Suggested fix
-from typing import Any, Literal +from typing import Any, Literal, Iterator @@ - def extract_super_type(self, cls: "CreationRequest"): + def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]: for field_name in cls.model_fields.keys(): field_value = getattr(self, field_name) yield (field_name, field_value)As per coding guidelines, please add explicit return types.
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py`:
- Line 68: Replace the global unique constraint on the collection name with a
composite unique constraint for per-project uniqueness: update the migration
call to op.create_unique_constraint (the one currently creating a unique
constraint on "collection" with ["name"]) so it instead creates the constraint
on ["project_id","name"] to match the application logic enforced by
ensure_unique_name and CollectionCrud.exists_by_name.
In `@backend/app/services/collections/create_collection.py`:
- Around line 135-143: Annotate the unused job runner arg by importing
typing.Any and changing the execute_job signature to declare task_instance: Any,
then explicitly mark it as intentionally unused inside the function (e.g. assign
it to _ or add a comment like "_ = task_instance # intentionally unused") so
Ruff stops flagging it; keep other parameter annotations and the return type
as-is and reference the execute_job function and the task_instance parameter
when making the change.
In `@backend/app/services/collections/providers/openai.py`:
- Around line 16-21: The OpenAIProvider.__init__ duplicates assigning
self.client after BaseProvider.__init__ already sets it and the constructor
lacks the return type hint; remove the redundant "self.client = client"
assignment in the OpenAIProvider.__init__ and update the method signature to
include the None return type (i.e., def __init__(self, client: OpenAI) -> None:)
while leaving the call to super().__init__(client) intact.
In `@backend/app/tests/services/collections/providers/test_openai.py`:
- Around line 1-13: The test module name collides with another test (causing
pytest collection errors); rename the test file from test_openai.py to a unique
name like test_openai_provider.py, update any references/imports if present, and
rerun test discovery to confirm no duplicate modules; focus on the module that
tests OpenAIProvider and helpers (OpenAIProvider, generate_openai_id,
get_mock_openai_client_with_vector_store) so their imports still resolve after
renaming.
♻️ Duplicate comments (3)
backend/app/tests/services/collections/test_delete_collection.py (2)
73-76: Missing type hint fordbparameter.The
dbparameter is missing theSessiontype annotation. This was flagged in a previous review and should be addressed for consistency with coding guidelines.`@patch`("app.services.collections.delete_collection.get_llm_provider") def test_execute_job_delete_success_updates_job_and_calls_delete( - mock_get_llm_provider: MagicMock, db + mock_get_llm_provider: MagicMock, db: Session ) -> None:
139-142: MissingSessiontype hint ondbparameter in remaining test functions.Apply the same fix to the other test functions:
test_execute_job_delete_failure_marks_job_failed(line 141)test_execute_job_delete_success_with_callback_sends_success_payload(line 216)test_execute_job_delete_remote_failure_with_callback_sends_failure_payload(line 298)Also applies to: 213-217, 293-299
backend/app/models/collection.py (1)
31-37: FixproviderField tuple wrapping.Line 31 wraps
Field(...)in a one-element tuple, which breaks SQLModel field definition and schema mapping.🐛 Proposed fix
- provider: ProviderType = ( - Field( - nullable=False, - description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", - sa_column_kwargs={"comment": "LLM provider used for this collection"}, - ), - ) + provider: ProviderType = Field( + nullable=False, + description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", + sa_column_kwargs={"comment": "LLM provider used for this collection"}, + )
backend/app/alembic/versions/041_extend_collection_table_for_provider_.py
Outdated
Show resolved
Hide resolved
backend/app/tests/services/collections/providers/test_openai_provider.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/tests/services/collections/providers/test_openai_provider.py`:
- Around line 98-113: The test test_delete_openai_assistant creates a Collection
without required fields; update the Collection instantiation used in
test_delete_openai_assistant (or replace with the test factory from
backend/app/tests/utils/collection.py) to include the non-nullable fields
provider and project_id (e.g., provider=ProviderType.OPENAI or "OPENAI" and a
valid project_id int), so the Collection(...) passed to OpenAIProvider.delete is
fully valid; keep the rest of the test (MagicMock client, OpenAIProvider,
patching OpenAIAssistantCrud, and the final assert) unchanged.
♻️ Duplicate comments (1)
backend/app/tests/utils/llm_provider.py (1)
22-26: Mutable default argument detected.Using a mutable list as a default argument can cause unexpected behavior. Replace with
Noneand initialize within the function.Proposed fix
def mock_openai_assistant( assistant_id: str = "assistant_mock", - vector_store_ids: Optional[list[str]] = ["vs_1", "vs_2"], + vector_store_ids: Optional[list[str]] = None, max_num_results: int = 30, ) -> OpenAIAssistant: + if vector_store_ids is None: + vector_store_ids = ["vs_1", "vs_2"] return OpenAIAssistant(
🧹 Nitpick comments (2)
backend/app/tests/utils/llm_provider.py (2)
176-179: Add return type hint toget_mock_provider.Per coding guidelines, all function return values should have type hints.
Proposed fix
def get_mock_provider( llm_service_id: str = "mock_service_id", llm_service_name: str = "mock_service_name", -): +) -> MagicMock:
111-134: Add return type hint for consistency.Similar to
get_mock_provider, this function should include a return type hint.Proposed fix
-def get_mock_openai_client_with_vector_store() -> MagicMock: +def get_mock_openai_client_with_vector_store() -> MagicMock:The function already has the return type hint. No change needed.
Actually, this function already has the type hint. LGTM.
backend/app/tests/services/collections/providers/test_openai_provider.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_collection_info.py (1)
194-230: Undefined nameget_collectionwill cause test failure.Line 204 calls
get_collection(db, project), but this function is not imported. The imports at lines 9-12 only includeget_assistant_collectionandget_vector_store_collection. This will raise aNameErrorat runtime.Additionally, line 197 is missing the type hint for
user_api_key_header(should bedict[str, str]per coding guidelines).🐛 Proposed fix
def test_collection_info_include_docs_and_url( client: TestClient, db: Session, - user_api_key_header, + user_api_key_header: dict[str, str], ) -> None: """ Test that when include_docs=true and include_url=true, the endpoint returns documents with their URLs. """ project = get_project(db, "Dalgo") - collection = get_collection(db, project) + collection = get_assistant_collection(db, project)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/models/collection.py`:
- Around line 15-19: ProviderType enum values are uppercase and clash with
ProviderOptions.provider/CreationRequest which expect lowercase literals; change
the enum members in ProviderType (e.g., ProviderType.OPENAI) to use lowercase
strings (e.g., "openai") so Pydantic validation matches the Literal["openai"]
used by ProviderOptions/CreationRequest, and ensure any other members follow the
same lowercase convention.
♻️ Duplicate comments (1)
backend/app/models/collection.py (1)
40-46: Remove tuple wrapping around theproviderField.This still creates a one‑element tuple instead of a
Field, which breaks SQLModel field definition. This was flagged previously and appears unchanged.🐛 Proposed fix
- provider: ProviderType = ( - Field( - nullable=False, - description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", - sa_column_kwargs={"comment": "LLM provider used for this collection"}, - ), - ) + provider: ProviderType = Field( + nullable=False, + description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", + sa_column_kwargs={"comment": "LLM provider used for this collection"}, + )
🧹 Nitpick comments (1)
backend/app/models/collection.py (1)
112-113: Add missing type hints to comply with project typing rules.Both
model_post_initandextract_super_typeare missing return/parameter typing.♻️ Proposed fix
-from typing import Any, Literal +from typing import Any, Literal, Iterator - def model_post_init(self, __context: Any): + def model_post_init(self, __context: Any) -> None: self.documents = list(set(self.documents)) - def extract_super_type(self, cls: "CreationRequest"): + def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]: for field_name in cls.model_fields.keys(): field_value = getattr(self, field_name) yield (field_name, field_value)Please verify this still satisfies the project’s typing checks (e.g., mypy/pyright). As per coding guidelines.
Also applies to: 193-196
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_collection_info.py (1)
194-204: Missing type hint foruser_api_key_headerparameter.Line 197 is missing the type annotation
dict[str, str]that is consistently applied to the same parameter in other test functions in this file (lines 43, 83, 113, 148).Proposed fix
def test_collection_info_include_docs_and_url( client: TestClient, db: Session, - user_api_key_header, + user_api_key_header: dict[str, str], ) -> None:As per coding guidelines, always add type hints to all function parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
backend/app/services/collections/delete_collection.py (1)
141-149: Add a type hint fortask_instance(and mark it unused).This param is untyped and unused; add a type hint and explicitly ignore it to satisfy typing rules and linting. As per coding guidelines, please add explicit type hints and mark unused params.
🔧 Suggested fix
+from typing import Any @@ def execute_job( request: dict, project_id: int, organization_id: int, task_id: str, job_id: str, collection_id: str, - task_instance, + task_instance: Any, ) -> None: """Celery worker entrypoint for deleting a collection (both remote and local).""" + _ = task_instance # reserved for task runner hooks; unused for nowbackend/app/models/collection.py (1)
112-113: Add explicit-> Nonereturn type tomodel_post_init.The
model_post_initmethod is missing its return type annotation. Per coding guidelines and Pydantic conventions, all function return types must be explicitly annotated.Suggested fix
- def model_post_init(self, __context: Any): + def model_post_init(self, __context: Any) -> None: self.documents = list(set(self.documents))backend/app/services/collections/create_collection.py (1)
165-190: Moveprovider.create()inside the active session context.
document_crudis created inside thewith Session(engine)block but passed toprovider.create(), which is called outside the block. Theprovider.create()method callsbatch_documents(), which in turn callsdocument_crud.read_each(). SinceDocumentCrudstores the session reference and executes queries viaself.session.exec(), any database access after the session closes will fail.🔧 Suggested fix
with Session(engine) as session: collection_job_crud = CollectionJobCrud(session, project_id) collection_job = collection_job_crud.read_one(job_uuid) collection_job = collection_job_crud.update( job_uuid, CollectionJobUpdate( task_id=task_id, status=CollectionJobStatus.PROCESSING, ), ) storage = get_cloud_storage(session=session, project_id=project_id) document_crud = DocumentCrud(session, project_id) provider = get_llm_provider( session=session, provider=creation_request.provider, project_id=project_id, organization_id=organization_id, ) - - result = provider.create( - collection_request=creation_request, - storage=storage, - document_crud=document_crud, - ) + result = provider.create( + collection_request=creation_request, + storage=storage, + document_crud=document_crud, + )backend/app/tests/api/routes/collections/test_collection_info.py (1)
194-198: Add missing type hint foruser_api_key_headerparameter.This parameter is missing the type annotation
dict[str, str], which is used consistently in other test functions in this file (lines 43, 83, 113, 148, 179). As per coding guidelines, all function parameters should have type hints.Proposed fix
def test_collection_info_include_docs_and_url( client: TestClient, db: Session, - user_api_key_header, + user_api_key_header: dict[str, str], ) -> None:backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
12-31: Add type hints and fix potentialNonevalue forproject_id.The function parameters
clientandproject_idare missing type hints, violating the coding guidelines. Additionally,project_idcan beNoneby default but is passed directly toCollection(project_id=project_id, ...)where the model requires a non-null value. The function retrieves a project viaget_project(db)but never uses itsidas a fallback.Proposed fix
+from typing import Optional +from openai import OpenAI as OpenAIClient + def get_assistant_collection_for_delete( - db: Session, client=None, project_id: int = None + db: Session, + client: Optional[OpenAIClient] = None, + project_id: Optional[int] = None, ) -> Collection: project = get_project(db) if client is None: client = OpenAI(api_key="test_api_key") + if project_id is None: + project_id = project.id vector_store = client.vector_stores.create() assistant = client.beta.assistants.create( model="gpt-4o", tools=[{"type": "file_search"}], tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}, ) return Collection( project_id=project_id, llm_service_id=assistant.id, llm_service_name="gpt-4o", provider=ProviderType.openai, )backend/app/tests/utils/collection.py (1)
43-51: Removeorganization_idfield — it no longer exists on theCollectionmodel.The
Collectionmodel was scoped to projects only and no longer has anorganization_idfield. Setting this field during instantiation will cause a model validation error.Proposed fix
collection = Collection( id=collection_id or uuid4(), project_id=project.id, - organization_id=project.organization_id, llm_service_name=model, llm_service_id=assistant_id, provider=ProviderType.openai, )
🤖 Fix all issues with AI agents
In `@backend/app/api/docs/collections/create.md`:
- Around line 13-18: Update the wording and capitalization for all occurrences
of "Openai" and "Openai vector Store" in the document: replace with "OpenAI" and
"OpenAI vector store" respectively, and tighten the sentence about cleanup to
read more concisely (e.g., "If any LLM service interaction fails, all service
resources are cleaned up; for example, if an OpenAI vector store cannot be
created, any files uploaded to OpenAI are removed. Failures can result from
service outages, invalid parameters, or unsupported document types such as
unparseable PDFs."). Apply the same edits to the similar paragraph referenced
(the other occurrence around lines 20-22).
In `@backend/app/services/collections/create_collection.py`:
- Around line 149-155: Initialize creation_request to None before the try block
so the except path can reference it without causing UnboundLocalError;
specifically, add creation_request = None above the try, keep the
CreationRequest(**request) call inside the try to assign creation_request, and
ensure any failure callback logic (which references creation_request) will
handle a None value safely if construction failed.
♻️ Duplicate comments (3)
backend/app/services/collections/create_collection.py (1)
135-143: Type-annotate and marktask_instanceas intentionally unused.This param is untyped and unused; add a type hint and explicitly ignore it to satisfy typing rules and linting. As per coding guidelines, please add explicit type hints and mark unused params.
🔧 Suggested fix
+from typing import Any @@ def execute_job( request: dict, with_assistant: bool, project_id: int, organization_id: int, task_id: str, job_id: str, - task_instance, + task_instance: Any, ) -> None: @@ - start_time = time.time() + _ = task_instance # reserved for task runner hooks; unused for now + start_time = time.time()backend/app/models/collection.py (1)
40-46: FixproviderField tuple wrapping (SQLModel won’t map the column).The trailing comma makes this a tuple rather than a
Field, so the column mapping will be broken.🔧 Suggested fix
- provider: ProviderType = ( - Field( - nullable=False, - description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", - sa_column_kwargs={"comment": "LLM provider used for this collection"}, - ), - ) + provider: ProviderType = Field( + nullable=False, + description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)", + sa_column_kwargs={"comment": "LLM provider used for this collection"}, + )backend/app/tests/services/collections/test_delete_collection.py (1)
73-76: Type-annotatedbin patched tests.Several test functions still accept
dbwithout a type hint. As per coding guidelines, adddb: Sessionto each.🔧 Suggested fix
`@patch`("app.services.collections.delete_collection.get_llm_provider") def test_execute_job_delete_success_updates_job_and_calls_delete( - mock_get_llm_provider: MagicMock, db + mock_get_llm_provider: MagicMock, db: Session ) -> None: @@ `@patch`("app.services.collections.delete_collection.get_llm_provider") def test_execute_job_delete_failure_marks_job_failed( - mock_get_llm_provider: MagicMock, db + mock_get_llm_provider: MagicMock, db: Session ) -> None: @@ def test_execute_job_delete_success_with_callback_sends_success_payload( mock_send_callback: MagicMock, mock_get_llm_provider: MagicMock, - db, + db: Session, ) -> None: @@ def test_execute_job_delete_remote_failure_with_callback_sends_failure_payload( mock_send_callback: MagicMock, mock_get_llm_provider: MagicMock, - db, + db: Session, ) -> None:Also applies to: 138-141, 210-214, 291-295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/collections/create_collection.py (1)
166-191: Keep the DB session alive forprovider.createinputs.
document_crud(and potentiallystorage) are built inside thewith Session(...)block but used after the block exits. Ifprovider.createtouches the DB viadocument_crud, this will hit a closed session at runtime. Move theprovider.create(...)call into the same session scope (or recreateDocumentCrudinside a new session that stays open for the call).🔧 Suggested fix
- with Session(engine) as session: + with Session(engine) as session: collection_job_crud = CollectionJobCrud(session, project_id) collection_job = collection_job_crud.read_one(job_uuid) collection_job = collection_job_crud.update( job_uuid, CollectionJobUpdate( task_id=task_id, status=CollectionJobStatus.PROCESSING, ), ) storage = get_cloud_storage(session=session, project_id=project_id) document_crud = DocumentCrud(session, project_id) provider = get_llm_provider( session=session, provider=creation_request.provider, project_id=project_id, organization_id=organization_id, ) - - result = provider.create( - collection_request=creation_request, - storage=storage, - document_crud=document_crud, - ) + result = provider.create( + collection_request=creation_request, + storage=storage, + document_crud=document_crud, + )
♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)
135-143: Type-annotate and marktask_instanceas intentionally unused.Ruff flags
task_instanceas unused, and it lacks a type hint. If it’s required by the task runner, explicitly annotate it and mark it unused to satisfy linting.🔧 Suggested fix
+from typing import Any @@ def execute_job( request: dict, with_assistant: bool, project_id: int, organization_id: int, task_id: str, job_id: str, - task_instance, + task_instance: Any, ) -> None: @@ + _ = task_instance # reserved for task runner hooks; unused for nowAs per coding guidelines, please add explicit type hints for all parameters and returns.
|
Collection crud -> it is soft deleting the collection entity, but the collection data is usually present in the db, but the DB unique constraint (project_id, name) still treats deleted rows as existing. Suggestion: I am not sure about the whole flow. if u think my analogy is wrong then ignore it. Only soft deleting is happening here in collectionunique constraint is there at the db level |
Summary
Target issue is #489
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
New Features
Tests
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.