Skip to content

Conversation

@bburda42dot
Copy link
Owner

No description provided.

vinodreddy-g and others added 10 commits January 19, 2026 07:17
First implementation proposal for faultlib
- Add fault_lib and rust_kvs to dfm_lib library/test deps
- Add serde_json to library/test deps
- Add env_logger, log, tempfile, fault_lib to dfm binary deps
- Add sovd_fm binary target (was missing from BUILD)
- Align BUILD deps with Cargo.toml dependencies
… path, and concurrent tests

- Add RecordingSink, FailingSink, SlowSink mock implementations to test_utils
- Add design_tests.rs validating Reporter API contracts from design.md
- Add error_tests.rs for SinkError propagation and edge cases
- Add concurrent_tests.rs for thread-safety verification
- Add timestamp_tests.rs for lifecycle timing validation
- Add dfm_tests.rs for DFM integration scenarios
- Add safety lints: forbid(unsafe_code), deny(clippy::todo, clippy::unimplemented)
…with backpressure

- Change SinkError::Other/BadDescriptor to use Cow<'static, str> instead of &'static str
- Remove Box::leak() calls that caused memory leaks on every error
- Replace unbounded mpsc::channel with sync_channel(1024) for backpressure
- Add SinkError::QueueFull variant for channel overflow handling
- Replace busy-wait loop with recv_timeout for proper waiting
- Add path validation (MAX_PATH_LENGTH=256, alphanumeric start, valid characters)
- Increase poll interval from 10ms to 50ms to reduce CPU usage
- Handle Weak::upgrade() failure gracefully with proper error message

BREAKING: SinkError no longer implements Copy (Clone still available)
- Add AtomicBool shutdown_flag to FaultLibCommunicator
- Worker thread checks flag periodically with timeout-based receive
- Use Acquire/Release ordering for proper synchronization
- DiagnosticFaultManager signals shutdown before joining worker thread
- Prevents deadlock that occurred during drop when worker never exited
- Match all LifecycleStage variants in fault_record_processor (no longer panics)
- Implement delete_fault, delete_all_faults in SOVD fault manager
- Format FaultId::Numeric as hex, FaultId::Uuid as UUID string
- Add CatalogBuildError type with InvalidJson and MissingConfig variants
- Add try_build() fallible API to FaultCatalogBuilder
- build() now returns Result instead of panicking on invalid config
- FaultRecord now populated with current system time (seconds + nanoseconds)
- Add documentation for DebounceMode explaining behavior of each variant
- Add documentation for ResetMode explaining aging and reset policies
- Note: debounce logic is documented but not wired up (see MVP.md for future work)
- Move FaultCatalog, FaultCatalogBuilder, FaultCatalogConfig to common/src/catalog.rs
- Move to_static_short_string/to_static_long_string utilities to common/src/types.rs
- fault_lib/src/catalog.rs now re-exports from common for backward compatibility
- fault_lib/src/utils.rs re-exports utility functions for backward compatibility
- Remove fault_lib dependency from dfm_lib (Cargo.toml and BUILD)
- Update dfm_lib imports to use common::catalog directly
- Move mockall from runtime to dev-dependencies only

This simplifies dependency graph:
  Before: dfm_lib → fault_lib → common
  After:  dfm_lib → common (no fault_lib dependency)
- Remove unused mockall::automock imports from fault_manager_sink.rs and ipc_worker.rs
- Remove unused SovdEnvData import from dfm_tests.rs
- Remove unused utils::* import from catalog_and_reporter.rs example
- Add #[allow(dead_code)] to get_fault_sink (internal API, kept for future use)
- Add #[allow(dead_code)] to make_reporter_with_descriptor test helper
…tion readiness

- Remove duplicate [[bin]] entries from Cargo.toml files (bin vs example conflict)
- Fix clippy new_without_default warnings for RecordingSink and AtomicCountingSink
- Fix clippy single_match warning in dfm_tests.rs
- Fix examples/BUILD to reference //:Cargo.lock correctly
- Add exports_files for Cargo.lock in root BUILD
- Add data attribute and env for Bazel test data path resolution
- Create test_data_path helper for Cargo/Bazel test compatibility
- Add MVP.md documenting feature status and limitations
- Add CHANGELOG.md with version history
- Update README.md with documentation links

Build status:
- cargo build: PASS (no warnings)
- cargo clippy: PASS (no warnings)
- cargo test: 86 passed, 1 ignored (TDD stub)
- bazel build //src/...: PASS
- bazel build //examples/...: PASS
- bazel test //src/...: 3/3 PASS
@bburda42dot bburda42dot self-assigned this Feb 10, 2026
Implement REQ-4 from design.md: "The debouncing should be in the fault
lib to reduce the traffic on the IPC."

- Add debouncer (Box<dyn Debounce + Send + Sync>) and last_stage fields
  to Reporter struct, initialized from FaultDescriptor config
- Wire debounce filtering into Reporter::publish() — suppressed events
  return Ok(()) silently, reducing IPC traffic to the DFM
- Reset debouncer on Passed→Failed lifecycle transitions so new fault
  occurrences start with a clean debounce window
- Enable previously-ignored TDD test and add 6 new tests covering all
  DebounceMode variants (CountWithinWindow, HoldTime, EdgeWithCooldown),
  lifecycle reset, and pass-through without debounce
- Update all existing test Reporter constructions with new fields
- Update DebounceMode::into() to return Send+Sync trait object
Extend LogHook trait with on_publish()/on_error() callbacks (replacing
unused on_report()), add NoOpLogHook default, and wire hook into
Reporter::publish() after sink delivery. Hook is populated from FaultApi
global or set directly for testing. Debounce-suppressed events do not
trigger the hook. Adds 6 REQ-10 design contract tests.

BREAKING CHANGE: LogHook trait signature changed from on_report() to
on_publish()/on_error(). Trait was previously defined but unused.
…ent IPC failures

Implement REQ-7 fault caching with retry logic in the IPC worker thread,
ensuring faults are not lost when DFM is temporarily unavailable.

Changes:
- Add RetryConfig (fault_lib-internal) with max_retries, cache_capacity,
  retry_interval, and max_retry_interval
- Add CachedFault with exponential backoff: 100ms -> 200ms -> ... -> 5s cap
- Add IpcWorkerState with retry queue (VecDeque), cache eviction (oldest
  first), and transient/permanent error classification
- Change worker loop from blocking recv to recv_timeout(50ms) for periodic
  retry processing
- Only Fault events are retried; Hash checks are time-sensitive and dropped
- Final flush on shutdown attempts to deliver remaining cached faults
- Add RUST_TEST_THREADS=1 to Bazel BUILD for iceoryx2 compatibility
- 16 new tests: backoff, eviction, transient/permanent classification,
  retry success/failure/flakey publisher, backoff timing, multi-event

Non-blocking publish path (REQ-2) unchanged: FaultManagerSink::publish()
still enqueues to mpsc channel and returns immediately.
Add debounce support in FaultRecordProcessor:
- FaultKey struct for per-source+fault_id state tracking
- Lazy debouncer creation from catalog's manager_side_debounce config
- Lifecycle transition detection resets debounce on Passed→Failed
- Independent debounce state per source (app isolation)

Uses existing DebounceMode::into() from common crate.

Includes 5 new tests covering all debounce scenarios.
Add aging infrastructure for automatic fault healing:
- OperationCycleTracker: tracks named operation cycles (power, ignition, etc.)
- AgingManager: evaluates ResetPolicy conditions (PowerCycles, OperationCycles, StableFor, ToolOnly)
- AgingState: per-fault aging context (last active cycle/time, healing counters)

Extend ResetTrigger with OperationCycles variant:
- Uses ShortString for cycle_ref (IPC-safe, #[repr(C)] compatible)
- Backward compatible with existing PowerCycles trigger

Extend SovdFaultState with aging counters:
- occurrence_counter, aging_counter, healing_counter
- first_occurrence_secs, last_occurrence_secs (Unix timestamps)
- KVS backward compatibility: missing fields default to 0

Includes comprehensive test coverage for all reset triggers.
Add CDA-aligned SovdFaultStatus struct:
- All ISO 14229 DTC status bits as Option<bool>
- compute_mask() for UDS-compliant status byte
- from_state() for SovdFaultState conversion
- to_hash_map() for backward compatibility

Extend SovdFault with richer diagnostics:
- typed_status: Option<SovdFaultStatus> (typed alternative)
- occurrence_counter, aging_counter, healing_counter
- first_occurrence, last_occurrence (ISO 8601 timestamps)
- All existing fields preserved (symptom, translation_id, schema)

Backward compatible: HashMap status still populated via to_hash_map().
Status now includes 'mask' field with hex-encoded status byte.
Add comprehensive unit tests for core types:
- common/fault.rs: FaultId variants, FaultSeverity, LifecycleStage,
  IpcTimestamp, ComplianceTag, FaultType (16 new tests)

Expand dfm_lib SOVD tests:
- SovdFault typed_status population
- Status mask in HashMap (ISO 14229 format)
- Counters exposure (occurrence, aging, healing)
- Existing fields preservation

Total: ~110 tests across workspace (24 common + 43 dfm_lib + 43 fault_lib)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants