This project is a high-performance distributed storage system simulation (CodeCraft) designed to model and optimize object reading, writing, deletion, and garbage collection in a large-scale data storage environment. The system operates under strict time and resource constraints to handle massive concurrent requests, maximizing system throughput and response speed.
The core modules include:
- Writer (Storage Allocation): Manages the layout and writing of objects onto disks.
- Reader (Read Scheduling): Optimizes head movement paths to efficiently respond to read requests.
- Deleter & Collector (Lifecycle Management): Handles logical deletion of objects and garbage collection of physical space.
- SharedData (Global State Management): Maintains the global state of disks, objects, and requests.
The Writer module employs the Segregated Fit strategy to manage disk space, aiming to reduce disk fragmentation and improve the success rate of writing large objects.
- Logic Flow:
- Scan Free Blocks: Iterate through the target disk to identify all continuous free spaces (marked as
0). - Sort: Sort the identified free blocks by length in ascending order.
- Best Fit: Iterate through the sorted list of free blocks and select the first block that can accommodate the current object (
length >= size).
- Scan Free Blocks: Iterate through the target disk to identify all continuous free spaces (marked as
- Advantage: This strategy tends to fill small gaps with small objects, preserving larger continuous spaces for large objects, thereby effectively reducing external fragmentation and improving overall space utilization.
The Reader module's primary goal is to minimize seek time while maximizing the value of request responses.
- Scoring Function:
The system assigns a score to each object pending read based on the urgency of the request. The scoring function
f(x)considers the request wait timex:- Short-term wait (
x <= 10): Score decays slowly, maintaining high priority. - Medium-term wait (
10 < x <= 105): Score decays faster. - Timeout (
x > 105): Score drops to zero; no longer prioritized. Additionally, object size is a factor ((object.size + 1) * 0.5), meaning reading larger objects yields higher returns.
- Short-term wait (
- Path Optimization:
get_nearest_object: Finds the nearest pending read object in the clockwise direction from the current head position.findMaxScoreInterval: Searches for a read interval that maximizes the cumulative Score minus the Cost (Cost considers head movement distance and the reduced state switching cost for continuous reading).
The Collector module is responsible for periodically reclaiming invalid space.
- Trigger Mechanism:
gc_actionis triggered everyFrePerSlicing(1800) time slices. - Function: Cleans up objects marked for deletion, merges free fragments, and updates physical address mappings to ensure disk availability over long-term operation.
The system uses data_correlation.cpp to analyze data read patterns across different time slices, calculating Pearson correlation coefficients to identify hot data and correlated data, assisting in pre-fetching or data layout optimization (e.g., assign_read_labels).
| File/Directory | Description |
|---|---|
main.cpp |
Entry point, responsible for time stepping, event dispatching, and the main loop. |
data.h/cpp |
Defines core data structures (Request, Object) and the singleton class SharedData for global state. |
write.h/cpp |
Writer class definition. write_segregated_fit.cpp contains specific space allocation algorithms. |
read_re.h/cpp |
Reader class definition. Contains read scheduling, scoring functions, and head control logic. |
collect.h/cpp |
Collector class definition. Responsible for garbage collection logic. |
delete.h/cpp |
Deleter class definition. Handles logical deletion requests. |
CMakeLists.txt |
CMake build script. |
- C++17 compatible compiler (GCC, Clang, MSVC)
- CMake 3.8 or higher
# 1. Create build directory
mkdir build
cd build
# 2. Generate Makefile
cmake ..
# 3. Build executable
makeThe program reads data from standard input (stdin) and outputs results to standard output (stdout) by default. It is typically used with file redirection.
# Assuming input data file is data.in
./code_craft < data.in > result.outAlternatively, for debugging, you can configure run parameters in your IDE or modify the freopen statement in main.cpp (remember to comment it out before submission).
- Do Not Modify: Sections in
CMakeLists.txtmarked as "Do not modify" (e.g., compiler standard, output path, target namecode_craft) must strictly remain unchanged to avoid evaluation failure. - Time Constraints: The core loop advances strictly according to the timestamp
timestamp, ensuring operations within each time slice are completed efficiently.