diff --git a/docs/content_management/content_api/managing_content.md b/docs/content_management/content_api/managing_content.md index a8d6b05007..976b53354c 100644 --- a/docs/content_management/content_api/managing_content.md +++ b/docs/content_management/content_api/managing_content.md @@ -122,7 +122,7 @@ $this->trashService->recover($trashItem, $newParent); ``` You can also search through Trash items and sort the results using several public PHP API Search Criteria and Sort Clauses that have been exposed for `TrashService` queries. -For more information, see [Searching in trash](search_api.md#searching-in-trash). +For more information, see [Search in trash](search_api.md#search-in-trash). ## Content types diff --git a/docs/release_notes/ez_platform_v3.1.md b/docs/release_notes/ez_platform_v3.1.md index 07cc9e3404..a17b4471f4 100644 --- a/docs/release_notes/ez_platform_v3.1.md +++ b/docs/release_notes/ez_platform_v3.1.md @@ -122,7 +122,7 @@ A customizable search controller has been extracted and placed in `ezplatform-se You can now search through the contents of Trash and sort the search results based on a number of Search Criteria and Sort Clauses that can be used by the `\eZ\Publish\API\Repository\TrashService::findTrashItems` method only. -For more information, see [Searching in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#searching-in-trash). +For more information, see [Search in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#search-in-trash). ### Repository filtering diff --git a/docs/search/embeddings_reference/embeddings_reference.md b/docs/search/embeddings_reference/embeddings_reference.md new file mode 100644 index 0000000000..021cda7ebd --- /dev/null +++ b/docs/search/embeddings_reference/embeddings_reference.md @@ -0,0 +1,45 @@ +--- +month_change: true +description: Embedding queries, embedding configuration, providers, and embedding search fields +--- + +# Embeddings search reference + +Embeddings provide vector representations of content or text, enabling semantic similarity search. +Foundational abstractions are provided for embedding-based search, while embedding providers generate vector representations. + +## EmbeddingQuery + +- [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): Represents a semantic similarity search request. +It encapsulates an [Embedding](#embedding) instance and supports pagination and aggregations through the same API as standard content queries. +Embedding queries do not support criteria, sort clauses, facet builders, or spellcheck + +## Embedding + +- [`Ibexa\Contracts\Core\Repository\Values\Content\Query\Embedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-Query-Embedding.html): Represents the semantic input used for similarity search. +Depending on the embedding provider, it can encapsulate text or vector data + +## Embedding providers + +Embedding providers generate vector representations for inputs. + +### Provider contracts + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html): Resolves the provider for a given embedding configuration + +## Embedding fields + +- [`Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-FieldType-EmbeddingFieldFactory.html): Creates dedicated search fields that store embedding vectors + +## Validation + +- [`Ibexa\Contracts\Core\Repository\Values\Content\QueryValidatorInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html): Validates embedding queries and configurations are validated before reaching the search engine + +!!! note "Taxonomy embeddings" + + Searching for embeddings can be used to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature. + The [`Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Taxonomy-Search-Query-Value-TaxonomyEmbedding.html) allows embedding queries to target taxonomy data. diff --git a/docs/search/search_api.md b/docs/search/search_api.md index 83de734dff..15f2ff5150 100644 --- a/docs/search/search_api.md +++ b/docs/search/search_api.md @@ -1,4 +1,5 @@ --- +month_change: true description: You can search for content, locations and products by using the PHP API. Fine-tune the search with Search Criteria, Sort Clauses and Aggregations. --- @@ -18,7 +19,7 @@ The service should be [injected into the constructor of your command or controll `SearchService` is also used in the back office of [[= product_name =]], in components such as Universal Discovery Widget or Sub-items List. -### Performing a search +### Perform a search To search through content you need to create a [`LocationQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-LocationQuery.html) and provide your Search Criteria as a series of Criterion objects. @@ -70,7 +71,7 @@ As such, `query` is recommended when the search is based on user input. The difference between `query` and `filter` is only relevant when using Solr or Elasticsearch search engine. With the Legacy search engine both properties give identical results. -#### Processing large result sets +#### Process large result sets To process a large result set, use [`Ibexa\Contracts\Core\Repository\Iterator\BatchIterator`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Iterator-BatchIterator.html). `BatchIterator` divides the results of search or filtering into smaller batches. @@ -175,7 +176,7 @@ $filter It's recommended to use an IDE that can recognize type hints when working with Repository Filtering. If you try to use an unsupported Criterion or Sort Clause, the IDE indicates an issue. -## Searching in a controller +## Search in controller You can use the `SearchService` or repository filtering in a controller, as long as you provide the required parameters. For example, in the code below, `locationId` is provided to list all children of a location by using the `SearchService`. @@ -196,7 +197,7 @@ When using Repository filtering, provide the results of `ContentService::find()` [[= include_file('code_samples/api/public_php_api/src/Controller/CustomFilterController.php', 16, 31) =]] ``` -### Paginating search results +### Paginate search results To paginate search or filtering results, it's recommended to use the [Pagerfanta library](https://github.com/BabDev/Pagerfanta) and [[[= product_name =]]'s adapters for it.](https://github.com/ibexa/core/blob/main/src/lib/Pagination/Pagerfanta/Pagerfanta.php) @@ -258,7 +259,7 @@ that doesn't belong to the provided Section: [[= include_file('code_samples/api/public_php_api/src/Command/FindComplexCommand.php', 46, 54) =]] ``` -### Combining independent Criteria +### Combine independent Criteria Criteria are independent of one another. This can lead to unexpected behavior, for instance because content can have multiple locations. @@ -281,7 +282,7 @@ Even though the location B is hidden, the query finds the content because both c - the content item is visible (it has the visible location A) -## Sorting results +## Sort results To sort the results of a query, use one of more [Sort Clauses](sort_clause_reference.md). @@ -295,27 +296,6 @@ For example, to order search results by their publication date, from oldest to n For the full list and details of available Sort Clauses, see [Sort Clause reference](sort_clause_reference.md). -## Searching in trash - -In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. -To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. -Searching in trash supports a limited set of Criteria and Sort Clauses. -For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). - -!!! note - - Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. - -``` php -[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... -[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] -``` - -!!! caution - - Make sure that you set the Criterion on the `filter` property. - It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. - ## Aggregation !!! caution "Feature support" @@ -378,4 +358,147 @@ $query->aggregations[] = new IntegerRangeAggregation('range', 'person', 'age', `null` means that a range doesn't have an end. In the example all values above (and including) 60 are included in the last range. -See [Agrregation reference](aggregation_reference.md) for details of all available aggregations. +See [Aggregation reference](aggregation_reference.md) for details of all available aggregations. + +## Search with embeddings + +Embeddings are numerical representations that capture the meaning of text, images, or other content. +Embeddings are generated by AI by converting words or documents into lists of numbers, instead of treating them as plain text. +Such lists, aka. vectors, can then be compared to find content with similar meaning. + +Searching with embeddings enables matching content based on meaning rather than exact text matches. +Instead of comparing keywords, the system compares vectors that represent the semantic meaning of content and the query input. + +!!! note "Taxonomy suggestions" + + Embedding queries have been introduced primarily to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature but you use them in other scenarios. + +Searching with embeddings can be combined with traditional search criteria and filters, which allows the semantic search to be constrained by content type, location, permissions, or other search criteria. + +An embedding query is represented by the `Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery` value object. +The object encapsulates the vector to search for, along with configuration such as the embedding model and similarity threshold. +The query is validated before being executed to ensure that the embedding configuration is consistent with the system setup. + +The following components are used to build and validate embedding-based queries: + +- [EmbeddingQuery](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): + Represents a semantic similarity search request. + It contains the input vector and configuration parameters such as the embedding model. + +- [EmbeddingQueryBuilder](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQueryBuilder.html): + A fluent builder for constructing `EmbeddingQuery` instances. + It enforces required parameters and integrates embedding queries with the search query pipeline. + +- [QueryValidatorInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html): + Validates embedding queries before they are passed to the search engine. + Implementations ensure that the embedding model exists and that vector dimensions match the configured embedding field. + + +### Use embedding queries in search + +Embedding queries are executed through the search API in the same way as other search requests. +You build an `EmbeddingQuery` instance by using a builder and pass it to the search service. +Embedding queries can also be combined with filters and search criteria to narrow down results, such as by content type, location, or permissions. + +``` php +use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder; +use Ibexa\Contracts\Core\Repository\Values\Content\Embedding; +use Ibexa\Contracts\Core\Repository\Values\Content\Query\Aggregation; + +// Create an embedding object that represents the search input +$embedding = new Embedding('Find content similar to this text'); + +// Build the embedding query by using the fluent builder +$embeddingQuery = EmbeddingQueryBuilder::create() + ->withEmbedding($embedding) + ->setLimit(10) // maximum number of results + ->setOffset(0) // result offset for pagination + ->setPerformCount(true) // optionally count total matching items + ->setAggregations([ + new Aggregation('count_by_type'), + ]) + ->build(); + +// Execute the query via the repository +$results = $repository->findContent($embeddingQuery); +``` + +The `EmbeddingQueryBuilder` ensures that the query is correctly configured before execution. + +!!! note "Embedding query properties" + + Embedding queries do not allow standard Query properties such as `query`, `sortClauses`, `facetBuilders`, or `spellcheck`. + +### Embedding configuration and providers + +Models used to resolve embedding queries must be configured in [system configuration](configuration.md). +Each key defines the model's name, vector dimensionality, the field suffix used in the search index, and the embedding provider that generates vectors. + +``` yaml +ibexa: + system: + default: + embedding_models: + text-embedding-3-small: + name: 'text-embedding-3-small' + dimensions: 1536 + field_suffix: '3small' + embedding_provider: 'ibexa_openai' +``` + +For a real-life example of embedding configuration, see [Taxonomy suggestions](taxonomy.md#change-the-embedding-generation-model). + +Embedding providers implement the contract for generating vector representations of input data. +At runtime, the system resolves right provider and assigns embedding generation to it. + +- [EmbeddingConfigurationInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingConfigurationInterface.html) defines how embedding models are configured in the system (model name, vector dimensionality, provider reference, field suffix). + +- [EmbeddingProviderInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html) is the runtime contract for generating vector representations from text or other inputs. + +- [EmbeddingProviderRegistryInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html) lists all available embedding providers. + +- [EmbeddingProviderResolverInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html) determines which provider should be used for a given embedding configuration. + +### Embedding fields + +Embedding vectors are stored in dedicated search fields that are created by `Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`. +These fields are then used by the search engine to perform vector similarity comparisons when embedding queries are executed. + +``` php +use Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory; +use Ibexa\Contracts\Core\Search\Embedding\EmbeddingConfigurationInterface; + +// $config is an existing EmbeddingConfigurationInterface +$factory = new EmbeddingFieldFactory($config); + +// Create a default embedding field (type derived from config suffix) +$embeddingField = $factory->create(); +echo $embeddingField->getType(); // for example, "ibexa_dense_vector_model_123" + +// Create a custom embedding field with a specific type +$customField = $factory->create('custom_embedding_type'); +echo $customField->getType(); // "custom_embedding_type" +``` + +For more information, see [Embeddings reference](embeddings_reference.md). + +## Search in trash + +In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. +To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. +Searching in trash supports a limited set of Criteria and Sort Clauses. +For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). + +!!! note + + Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. + +``` php +[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... +[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] +``` + +!!! caution + + Make sure that you set the Criterion on the `filter` property. + It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. diff --git a/docs/search/search_criteria_and_sort_clauses.md b/docs/search/search_criteria_and_sort_clauses.md index a6a624322b..b772074ab1 100644 --- a/docs/search/search_criteria_and_sort_clauses.md +++ b/docs/search/search_criteria_and_sort_clauses.md @@ -79,7 +79,7 @@ Available tags for Sort Clause handlers in Legacy Storage Engine are: - for Criterion handlers: `ibexa.core.trash.search.legacy.gateway.criterion_handler` - for Sort Clause handlers: `ibexa.core.trash.search.legacy.gateway.sort_clause_handler` - For more information about searching for content items in Trash, see [Searching in trash](search_api.md#searching-in-trash). + For more information about searching for content items in Trash, see [Search in trash](search_api.md#search-in-trash). For more information about the Criteria and Sort Clauses that are supported when searching for trashed content items, see [Searching in trash reference](search_in_trash_reference.md). diff --git a/docs/search/search_in_trash_reference.md b/docs/search/search_in_trash_reference.md index 681f53eaa5..59f7b80aca 100644 --- a/docs/search/search_in_trash_reference.md +++ b/docs/search/search_in_trash_reference.md @@ -6,7 +6,7 @@ month_change: false # Search in trash reference -When you [search for content items that are held in trash](search_api.md#searching-in-trash), you can apply only a limited subset of Search Criteria and Sort Clauses +When you [search for content items that are held in trash](search_api.md#search-in-trash), you can apply only a limited subset of Search Criteria and Sort Clauses which can be used by [`Ibexa\Contracts\Core\Repository\TrashService::findTrashItems`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-TrashService.html#method_findTrashItems). Some sort clauses are exclusive to trash search. diff --git a/mkdocs.yml b/mkdocs.yml index 77a37fda43..0a5abe7762 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -798,6 +798,7 @@ nav: - ProductPriceRangeAggregation: search/aggregation_reference/productpricerange_aggregation.md - ProductTypeTermAggregation: search/aggregation_reference/producttypeterm_aggregation.md - TaxonomyEntryIdAggregation: search/aggregation_reference/taxonomyentryid_aggregation.md + - Embeddings search reference: search/embeddings_reference/embeddings_reference.md - Search in trash reference: search/search_in_trash_reference.md - Extend search: - Create custom Search Criterion: search/extensibility/create_custom_search_criterion.md