Skip to content

Enh/virtual table enh2#34559

Open
yihaoDeng wants to merge 28 commits into3.0from
enh/virtualTableEnh2
Open

Enh/virtual table enh2#34559
yihaoDeng wants to merge 28 commits into3.0from
enh/virtualTableEnh2

Conversation

@yihaoDeng
Copy link
Contributor

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

- Extended SQL grammar to support tag column references using the same
  syntax as non-tag column references (col_name FROM db.table.col and
  db.table.col positional form)
- Updated SCreateVSubTableStmt with pSpecificTagRefs and pTagRefs fields
- Modified parTranslater to validate and process tag references through
  checkAndReplaceTagRefs
- Added JSON serialization/deserialization for new tag reference fields
- Added test cases in test_vtable_tag_ref.py
…thout truncation

- parser/parTranslater.c: relax checkColRef to only check type (not bytes)
  for variable-length types (binary/nchar/varchar/varbinary), allowing
  virtual table columns to have different lengths from source columns.

- executor/executil.c: in initQueryTableDataCondWithColArray, use
  TMAX(virtual_bytes, source_bytes) for variable-length types to ensure
  buildBuf allocation is adequate for source data.

- executor/scanoperator.c:
  - createVTableScanInfoFromParam: relax type+bytes check to type-only
    for variable-length types; after createOneDataBlockWithColArray,
    update pResBlock column info.bytes to source table bytes so
    doCopyColVal length check passes.
  - createVTableScanInfoFromBatchParam: populate SColIdPair.type from
    source schema; add same pResBlock info.bytes update loop.

- test: add test_vtable_nchar_length.py covering three scenarios
  (vtable col len > / = / < source col len) with projection, SQL
  functions (length, char_length, lower, upper, concat, substring,
  replace, ascii, position, cast, first, last), filters, and mixed
  cross-table references.
- Add validateSrcTableColRef to verify source table/column existence
- vtbRefValidateLocal: check local vnode meta first
- vtbRefValidateRemote: fetch DB vgroup info from MNode, calculate vgId,
  fetch table schema from target vnode via RPC, validate column existence
- Fix deadlock: release META_READER_LOCK before RPC calls in
  doOptimizeVTableNameFilter; skip self-RPC when localVgId matches
- Fix double htonl bug in vtbRefFetchTableSchema (vgId encoding)
- Fix rpcMallocCont/taosMemoryFree mismatch causing crash on cross-db refs
- Fix sysTableFillOneVirtualTableRefImpl: correct src_column_name (col 6),
  err_code (col 8) and err_msg (col 9) population
- Populate err_code with TSDB_CODE_PAR_TABLE_NOT_EXIST or
  TSDB_CODE_PAR_INVALID_REF_COLUMN when validation fails
- Introduce a new test suite for validating virtual table referencing in both same-db and cross-db scenarios.
- Implement tests to ensure that references to normal and child tables are correctly validated, checking for successful error codes.
- Set up necessary databases and tables for testing, including source tables and virtual tables for comprehensive coverage.
# Conflicts:
#	source/libs/parser/inc/sql.y
Copilot AI review requested due to automatic review settings February 11, 2026 13:29
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yihaoDeng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the diagnostics and manageability of virtual tables by introducing a dedicated mechanism to validate their column references. It provides a new system table and a SHOW command that allow users to inspect the health of virtual table definitions, identifying broken links to source data. This is crucial for maintaining data integrity and simplifying debugging in complex virtual table setups, especially across distributed environments.

Highlights

  • New System Table for Virtual Table Reference Validation: Introduced a new system table, information_schema.ins_virtual_tables_referencing, to provide detailed information about the validity of virtual table column references.
  • New SHOW Command: Added a new SQL command SHOW VTABLE VALIDATE FOR <table_name> which allows users to explicitly check the integrity of a virtual table's column references.
  • Comprehensive Reference Validation: The new functionality validates virtual table column references against their source tables and columns, covering scenarios such as source tables in the same database, cross-database, and even cross-vnode references. It reports specific error codes and messages for invalid references, such as when a source table or column no longer exists.
  • Optimized Filtering: Implemented optimized filtering for the new system table, allowing efficient queries when filtering by virtual_table_name.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/common/systable.h
    • Added TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING macro for the new system table.
  • include/common/tmsg.h
    • Added TSDB_MGMT_TABLE_VIRTUAL_TABLES_REFERENCING to EShowType enum.
    • Added QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT to ENodeType enum.
    • Removed a redundant blank line.
  • include/libs/nodes/cmdnodes.h
    • Defined SValidateTableStmt struct, aliased as SShowValidateVirtualTable, to represent the structure for validating virtual tables.
  • include/util/tdef.h
    • Added TSDB_SHOW_VALIDATE_VIRTUAL_TABLE_ERROR macro for error message buffer size.
  • source/common/src/systable.c
    • Added a duplicate SYSTABLE_SCH_COL_NAME_LEN macro definition.
    • Defined the schema virtualTablesReferencing for the new system table.
    • Integrated virtualTablesReferencing into the infosMeta array.
  • source/dnode/mgmt/node_mgmt/src/dmTransport.c
    • Added TDMT_VND_TABLE_META_RSP to the dmProcessRpcMsg switch statement to handle new RPC responses.
  • source/dnode/mnode/impl/src/mndShow.c
    • Added logic to convertToRetrieveType to handle TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING.
  • source/libs/executor/src/sysscanoperator.c
    • Included strings.h, osMemPool.h, and tdef.h headers.
    • Defined SVirtualTableRefInfo struct for virtual table reference details.
    • Added static functions sysTableFillOneVirtualTableRef, sysTableFillOneVirtualTableRefImpl, sysTableIsOperatorCondOnOneVTableName, sysTableIsCondOnOneVTableName, and doOptimizeVTableNameFilter for handling virtual table reference data and filtering.
    • Modified doOptimizeTableNameFilter to consistently use SExtSchema* type.
    • Updated sysTableScanUserCols and sysTableScanUserVcCols to use SExtSchema* type and adjusted function calls.
    • Implemented sysTableScanVirtualTableRef for scanning the new virtual table reference system table.
    • Added static functions vtbRefValidateCallback, vtbRefGetDbVgInfo, vtbRefVgInfoComp, vtbRefHashValueComp, vtbRefGetVgId, vtbRefFetchTableSchema, vtbRefColExistsInSchema, vtbRefValidateLocal, vtbRefValidateRemote, validateSrcTableColRef, and getErrMsgFromCode for virtual table reference validation logic.
    • Modified doSysTableScanNext, resetSysTableScanOperState, and destroySysScanOperator to incorporate handling for TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING.
  • source/libs/nodes/src/nodesCodeFuncs.c
    • Added QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT to nodesNodeName function.
    • Implemented showValidateVTableStmtToJson and jsonToShowValidateVirtualTableStmt for JSON serialization/deserialization of the new statement type.
  • source/libs/nodes/src/nodesUtilFuncs.c
    • Added QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT to nodesMakeNode for node creation.
    • Added logic to nodesDestroyNode to free resources for SShowValidateVirtualTable.
  • source/libs/parser/inc/parAst.h
    • Declared createShowValidateVirtualTableStmt function.
  • source/libs/parser/inc/sql.y
    • Added a new grammar rule for CREATE VTABLE with specific_column_ref_list for column references.
    • Introduced a new grammar rule for SHOW VTABLE VALIDATE FOR full_table_name.
  • source/libs/parser/src/parAstCreater.c
    • Modified needDbShowStmt to include QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT.
    • Implemented createShowValidateVirtualTableStmt to create the AST node for the new SHOW command.
  • source/libs/parser/src/parAstParser.c
    • Included cmdnodes.h and tmsg.h headers.
    • Modified collectMetaKeyFromRealTableImpl to recognize TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING.
    • Added collectMetaKeyFromShowVirtualTablesReferencing and collectMetaKeyFromShowValidateVtable functions for metadata collection.
    • Adjusted collectMetaKeyFromShowBnodes, collectMetaKeyFromShowCluster, collectMetaKeyFromShowEncryptStatus, and collectMetaKeyFromShowApps for consistent acctId usage.
    • Integrated QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT into collectMetaKeyFromQuery.
  • source/libs/parser/src/parTokenizer.c
    • Added VALIDATE as a new keyword.
  • source/libs/parser/src/parTranslater.c
    • Added an entry for QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT in sysTableShowAdapter.
    • Declared and implemented rewriteShowValidateVtable for translating the new SHOW command into a SELECT statement.
    • Adjusted various function calls and comments for formatting and consistency.
    • Modified sysTableFromVnode, setVnodeSysTableVgroupList, and isSingleTable to include TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING.
    • Refined checkColRef function for more robust virtual table column reference validation.
    • Integrated QUERY_NODE_SHOW_VALIDATE_VTABLE_STMT into rewriteQuery.
  • source/libs/parser/src/parUtil.c
    • Modified getInsTagsTableTargetNameFromOp to handle virtual_db_name and virtual_table_name for filtering.
  • source/libs/planner/src/planPhysiCreater.c
    • Modified createSystemTableScanPhysiNode to include TSDB_INS_TABLE_VIRTUAL_TABLES_REFERENCING in its handling.
  • test/cases/05-VirtualTables/test_vtable_validate_referencing.py
    • Added a new Python test file to thoroughly validate the ins_virtual_tables_referencing system table.
    • Tests cover valid references (same-db, cross-db), scenarios with dropped source columns/tables/databases, full scans, filtering by virtual table name, error message population, and mixed valid/invalid references.
  • tests/test_new/case_list_docs/query/hint.md
    • Updated the markdown file, likely to reflect changes in test cases or documentation.
Activity
  • A new system table ins_virtual_tables_referencing has been introduced to provide validation details for virtual table column references.
  • The SHOW VTABLE VALIDATE FOR <table_name> command has been added to expose this validation information to users.
  • Extensive C code changes across various modules (systable, tmsg, cmdnodes, sysscanoperator, nodesCodeFuncs, nodesUtilFuncs, parAst, sql.y, parAstCreater, parAstParser, parTokenizer, parTranslater, parUtil, planPhysiCreater) implement the new system table, command parsing, semantic analysis, and execution logic.
  • A comprehensive Python test suite (test_vtable_validate_referencing.py) has been added to ensure the correctness and robustness of the new validation feature under various conditions, including valid references, broken references due to schema changes, and cross-database/cross-vnode scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new SHOW VTABLE VALIDATE FOR ... command and a corresponding system table information_schema.ins_virtual_tables_referencing to validate column references in virtual tables, covering cross-vnode and cross-database scenarios. A security audit, however, identified several critical issues, primarily in source/libs/executor/src/sysscanoperator.c. These include a buffer truncation bug, a memory leak in the schema caching mechanism, highly suspicious logic in the virtual child table column filling function that could lead to memory corruption or crashes, and the use of an uninitialized database name in an optimization path for the new system table scan. Furthermore, general code review noted a duplicate macro, a potential null pointer dereference, inconsistent naming, and dead code. Addressing these issues is crucial for the stability and security of this enhancement.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds validation functionality for virtual table column references through a new system table ins_virtual_tables_referencing and SQL command SHOW VTABLE VALIDATE FOR <table>. The implementation validates whether virtual table column references to source tables/columns are still valid after schema changes.

Changes:

  • Introduces new SHOW VTABLE VALIDATE SQL syntax and ins_virtual_tables_referencing system table
  • Implements validation logic that checks virtual table column references against source tables via local metadata or cross-vnode RPC
  • Adds comprehensive test suite covering validation scenarios (valid references, dropped columns/tables/databases, mixed valid/invalid references)

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
source/libs/parser/inc/sql.y Adds VALIDATE token and new SHOW VTABLE VALIDATE syntax; also adds problematic CREATE VTABLE variant
source/libs/parser/src/parTokenizer.c Registers VALIDATE keyword token
source/libs/parser/src/parAstCreater.c Creates AST node for SHOW VTABLE VALIDATE statement
source/libs/parser/src/parAstParser.c Collects metadata keys for validation query
source/libs/parser/src/parTranslater.c Translates and validates SHOW VTABLE VALIDATE statement
source/libs/parser/src/parUtil.c Handles virtual_db_name/virtual_table_name filter conditions
source/libs/planner/src/planPhysiCreater.c Routes new system table to vnode execution
source/libs/executor/src/sysscanoperator.c Implements validation logic with local and remote RPC-based checks
source/libs/nodes/src/nodesUtilFuncs.c Handles SShowValidateVirtualTable node creation/destruction
source/libs/nodes/src/nodesCodeFuncs.c Serialization support for new node type
source/common/src/systable.c Defines schema for ins_virtual_tables_referencing table
source/dnode/mnode/impl/src/mndShow.c Routes system table queries to appropriate handler
source/dnode/mgmt/node_mgmt/src/dmTransport.c Allows TABLE_META_RSP message type for validation RPCs
include/libs/nodes/cmdnodes.h Defines SShowValidateVirtualTable structure
include/common/tmsg.h Adds SHOW_VALIDATE_VTABLE_STMT and VIRTUAL_TABLES_REFERENCING enum values
include/common/systable.h Defines system table name constant
include/util/tdef.h Defines error message buffer size
test/cases/05-VirtualTables/test_vtable_validate_referencing.py Comprehensive test suite for validation functionality
tests/test_new/case_list_docs/query/hint.md Formatting change

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When creating a virtual table with tag column references, validate that
the referenced column is actually a tag column, not a data column.

- Add check using getNormalColSchema to detect data columns
- Return distinct error messages:
  - "references column which is not a tag column" for data columns
  - "references non-existent tag" for missing columns
- Add test cases in test_vtable_tag_ref.py
Fix compilation error where STableType was incorrectly used instead
of ETableType in vtbRefCreateSchemaCache and vtbRefGetTableSchemaLocal.

Also adds schema cache functionality for virtual table validation.
When validating virtual table column references, multiple columns may
reference the same remote table. Previously, each column reference
triggered a separate RPC call to fetch the table schema.

This optimization:
- Adds 'ownsSchema' flag to SVtbRefSchemaCache to support deep-copy
  for remote tables (vs shallow copy for local tables)
- Creates vtbRefCreateSchemaCacheFromMetaRsp to build cache from RPC response
- Adds vtbRefGetRemoteCacheEntry/vtbRefPutRemoteCacheEntry helpers
- Modifies vtbRefValidateRemote to cache successful results
- Checks cache before making RPC calls in validateSrcTableColRef

Result: N columns referencing the same remote table = 1 RPC instead of N
When validating virtual table column references, the previous implementation
used linear search (O(N)) to find column names in the cached schema. This
becomes inefficient for tables with many columns.

This optimization:
- Adds pColNameIndex (SHashObj*) to SVtbRefSchemaCache
- Creates vtbRefBuildColNameIndex to build hash index from schema array
- Modifies vtbRefCreateSchemaCache to build index for local tables
- Modifies vtbRefCreateSchemaCacheFromMetaRsp to build index for remote tables
- Simplifies vtbRefCheckColumnInCache to use hash lookup (O(1))
- Updates vtbRefFreeSchemaCache to cleanup hash table

Result: Column name lookup from O(N) to O(1), especially beneficial
for wide tables with many columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant