Skip to content

NJUWallSpider/2025HuaweiCodeCraft

Repository files navigation

CodeCraft Storage System Simulation

Project Overview

This project is a high-performance distributed storage system simulation (CodeCraft) designed to model and optimize object reading, writing, deletion, and garbage collection in a large-scale data storage environment. The system operates under strict time and resource constraints to handle massive concurrent requests, maximizing system throughput and response speed.

The core modules include:

  • Writer (Storage Allocation): Manages the layout and writing of objects onto disks.
  • Reader (Read Scheduling): Optimizes head movement paths to efficiently respond to read requests.
  • Deleter & Collector (Lifecycle Management): Handles logical deletion of objects and garbage collection of physical space.
  • SharedData (Global State Management): Maintains the global state of disks, objects, and requests.

Core Algorithms & Logic Analysis

1. Storage Allocation Strategy: Segregated Fit

The Writer module employs the Segregated Fit strategy to manage disk space, aiming to reduce disk fragmentation and improve the success rate of writing large objects.

  • Logic Flow:
    1. Scan Free Blocks: Iterate through the target disk to identify all continuous free spaces (marked as 0).
    2. Sort: Sort the identified free blocks by length in ascending order.
    3. Best Fit: Iterate through the sorted list of free blocks and select the first block that can accommodate the current object (length >= size).
  • Advantage: This strategy tends to fill small gaps with small objects, preserving larger continuous spaces for large objects, thereby effectively reducing external fragmentation and improving overall space utilization.

2. Read Scheduling Algorithm: Cost-Benefit Scheduling

The Reader module's primary goal is to minimize seek time while maximizing the value of request responses.

  • Scoring Function: The system assigns a score to each object pending read based on the urgency of the request. The scoring function f(x) considers the request wait time x:
    • Short-term wait (x <= 10): Score decays slowly, maintaining high priority.
    • Medium-term wait (10 < x <= 105): Score decays faster.
    • Timeout (x > 105): Score drops to zero; no longer prioritized. Additionally, object size is a factor ((object.size + 1) * 0.5), meaning reading larger objects yields higher returns.
  • Path Optimization:
    • get_nearest_object: Finds the nearest pending read object in the clockwise direction from the current head position.
    • findMaxScoreInterval: Searches for a read interval that maximizes the cumulative Score minus the Cost (Cost considers head movement distance and the reduced state switching cost for continuous reading).

3. Garbage Collection

The Collector module is responsible for periodically reclaiming invalid space.

  • Trigger Mechanism: gc_action is triggered every FrePerSlicing (1800) time slices.
  • Function: Cleans up objects marked for deletion, merges free fragments, and updates physical address mappings to ensure disk availability over long-term operation.

4. Data Correlation Analysis

The system uses data_correlation.cpp to analyze data read patterns across different time slices, calculating Pearson correlation coefficients to identify hot data and correlated data, assisting in pre-fetching or data layout optimization (e.g., assign_read_labels).

Code Structure

File/Directory Description
main.cpp Entry point, responsible for time stepping, event dispatching, and the main loop.
data.h/cpp Defines core data structures (Request, Object) and the singleton class SharedData for global state.
write.h/cpp Writer class definition. write_segregated_fit.cpp contains specific space allocation algorithms.
read_re.h/cpp Reader class definition. Contains read scheduling, scoring functions, and head control logic.
collect.h/cpp Collector class definition. Responsible for garbage collection logic.
delete.h/cpp Deleter class definition. Handles logical deletion requests.
CMakeLists.txt CMake build script.

Build & Run

Requirements

  • C++17 compatible compiler (GCC, Clang, MSVC)
  • CMake 3.8 or higher

Build Instructions

# 1. Create build directory
mkdir build
cd build

# 2. Generate Makefile
cmake ..

# 3. Build executable
make

Run Instructions

The program reads data from standard input (stdin) and outputs results to standard output (stdout) by default. It is typically used with file redirection.

# Assuming input data file is data.in
./code_craft < data.in > result.out

Alternatively, for debugging, you can configure run parameters in your IDE or modify the freopen statement in main.cpp (remember to comment it out before submission).

Notes

  • Do Not Modify: Sections in CMakeLists.txt marked as "Do not modify" (e.g., compiler standard, output path, target name code_craft) must strictly remain unchanged to avoid evaluation failure.
  • Time Constraints: The core loop advances strictly according to the timestamp timestamp, ensuring operations within each time slice are completed efficiently.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors