Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion bindings/python/src/codelist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,9 @@ impl PyCodeList {
let codelist_options = CodeListOptions::default();

// Create codelist
let codelist = CodeList::new(name, codelist_type, metadata, Some(codelist_options));
let codelist = CodeList::new(name, codelist_type, metadata, Some(codelist_options))
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Comment on lines +83 to +84
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this in.


Ok(PyCodeList { inner: codelist })
}

Expand Down
1 change: 1 addition & 0 deletions rust/codelist-rs/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ thiserror = "2.0.9"
thiserror-ext = "0.3.0"
chrono = { version = "0.4", features = ["serde"] }
indexmap = { version = "2.9.0", features = ["serde"] }
regex = "1.11.1"
2 changes: 1 addition & 1 deletion rust/codelist-rs/examples/sandbox.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use codelist_rs::{
fn main() -> Result<(), CodeListError> {
// Create a new codelist
let mut codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, Metadata::default(), None);
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, Metadata::default(), None)?;

codelist.add_entry("A00".to_string(), Some("Cholera".to_string()), None)?;
codelist.add_entry(
Expand Down
78 changes: 53 additions & 25 deletions rust/codelist-rs/src/codelist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use std::{
};

use csv::Writer;
use regex::Regex;
use serde::{Deserialize, Serialize};

// Internal imports
Expand Down Expand Up @@ -52,15 +53,22 @@ impl CodeList {
codelist_type: CodeListType,
metadata: Metadata,
options: Option<CodeListOptions>,
) -> Self {
CodeList {
) -> Result<Self, CodeListError> {
let options = options.unwrap_or_default();

// Validate custom regex if it has been set
if let Some(regex_str) = &options.custom_regex {
Regex::new(regex_str)?;
}

Ok(CodeList {
Comment on lines +56 to +64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Be interested to get @oylenshpeegul opinion on this as well.

If we go back to the user experience, i think that wil be guide us. I think in python we want to be able to do the following:

# Create a new codelist object
c = CodeList("Pneumonia", "ICD10", "Manually created")

# Add a code to the list
c.add_entry("A119", "pneumonia")

# validate the codelist
c.validate_codes() # validates ICD

# validate custom
c.validate_codes(regex_str="^[A-Z][0-9]{2}\.[0-9]$")

Users would need to use regex in python so we could set if it accepts a string or a regex expression. Let's maybe start with a regex expression and see what that might mean for typing.

I think we want something like a match statement for using the regex or not.

pub trait Validator {
    fn validate_codes(&self) -> Result<(), CodeListValidatorError>;
}

impl Validator for CodeList {
    fn validate_codes(&self, custom_regex: Option<&Regex>) -> Result<(), CodeListValidatorError> {
        match custom_regex {
            Some(regex) => {...actual implementation},
            None => {
                match self.codelist_type {
                    CodeListType::ICD10 => IcdValidator(self).validate_all_code(),
                    CodeListType::SNOMED => SnomedValidator(self).validate_all_code(),
                    CodeListType::OPCS => OpcsValidator(self).validate_all_code(),
                    CodeListType::CTV3 => Ctv3Validator(self).validate_all_code(),
                }
            }
        }
    }
}

With this in mind, putting the regex into CodelistOptions might not be the best place for this.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! If we want to serialize CodelistOptions, we can't have an Option<Regex> as @em-baggie said. But if we yank it out of there and make it a parameter instead, then why not?

I don't think I understand the Python part of your comment. Does PyO3 even translate regexes between Python and Rust? Python regexes are the fancy backtracking kind, whereas in Rust they are the more restricted finite automata kind.

Copy link
Collaborator Author

@em-baggie em-baggie Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was the regex could be included in the codelistoptions which can be customisable, and would link the regex to the particular codelist. But realise the way I've implemented it means if the regex is set, you can only validate with the regex and not normal validation which depends on the codelist type. Using it as a parameter is a lot more flexible, but the regex but will not be 'saved' in the codelist struct. But I guess validation notes can be added to the metadata to document if different validation methods have been used.

I think with the pyO3 the python can just pass a string and under the hood the rust can convert to regex within the method. Or maybe can just pass a regex in the python - not exactly sure how that would work. But to be honest think Caroline's method is a lot simpler and I probably overcomplicated this!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is codelistoptions something we want to expose to python api so it can be altered by user? Maybe I'm misunderstanding the purpose of these options 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a go at implementing the other approach in a separate PR then we can see them both

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to expose CodeslistOptions at some point in the Python API i think.

name,
entries: BTreeMap::new(),
codelist_type,
metadata,
logs: Vec::new(),
codelist_options: options.unwrap_or_default(),
}
codelist_options: options,
})
}

/// Get the type of the codelist
Expand Down Expand Up @@ -520,7 +528,7 @@ mod tests {
CodeListType::ICD10,
Metadata::default(),
None,
);
)?;
codelist.add_entry("R65.2".to_string(), None, None)?;

codelist.add_entry(
Expand Down Expand Up @@ -581,20 +589,22 @@ mod tests {
term_column_name: "test_term".to_string(),
code_field_name: "test_code".to_string(),
term_field_name: "test_term".to_string(),
custom_regex: None,
};

let codelist = CodeList::new(
"test_codelist".to_string(),
CodeListType::ICD10,
Default::default(),
Some(codelist_options),
);
)?;

assert!(codelist.codelist_options.allow_duplicates);
assert_eq!(codelist.codelist_options.code_field_name, "test_code".to_string());
assert_eq!(codelist.codelist_options.term_field_name, "test_term".to_string());
assert_eq!(codelist.codelist_options.code_column_name, "test_code".to_string());
assert_eq!(codelist.codelist_options.term_column_name, "test_term".to_string());
assert_eq!(codelist.codelist_options.custom_regex, None);

assert_eq!(codelist.metadata().provenance.source, Source::ManuallyCreated);
let time_difference = get_time_difference(codelist.metadata().provenance.created_date);
Expand Down Expand Up @@ -632,7 +642,7 @@ mod tests {
CodeListType::ICD10,
Default::default(),
None,
);
)?;
codelist.add_entry("R65.2".to_string(), Some("Severe sepsis".to_string()), None)?;
codelist.add_entry("R65.2".to_string(), Some("Severe sepsis".to_string()), None)?;

Expand Down Expand Up @@ -807,12 +817,14 @@ mod tests {
}

#[test]
fn test_get_metadata() {
fn test_get_metadata() -> Result<(), CodeListError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

let metadata: Metadata = Default::default();
let codelist =
CodeList::new("test".to_string(), CodeListType::ICD10, metadata.clone(), None);
CodeList::new("test".to_string(), CodeListType::ICD10, metadata.clone(), None)?;

assert_eq!(codelist.metadata(), &metadata);

Ok(())
}

#[test]
Expand Down Expand Up @@ -989,7 +1001,7 @@ mod tests {
CodeListType::SNOMED,
Default::default(),
None,
);
)?;

// A SNOMED list is not truncatable
assert!(snomed_codelist.truncate_to_3_digits(TermManagement::First).is_err());
Expand All @@ -1001,16 +1013,20 @@ mod tests {
fn test_truncate_to_3_digits_icd10_4_digits_drop_term() -> Result<(), CodeListError> {
let metadata: Metadata = Default::default();

let mut expected_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None);
let mut expected_codelist = CodeList::new(
"test_codelist".to_string(),
CodeListType::ICD10,
metadata.clone(),
None,
)?;
expected_codelist.add_entry(
"B01".to_string(),
None,
Some("Truncated to 3 digits, term discarded".to_string()),
)?;

let mut observed_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None);
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?;

observed_codelist.add_entry(
"B012".to_string(),
Expand All @@ -1029,16 +1045,20 @@ mod tests {
fn test_truncate_to_3_digits_3_and_4_digits_drop_term() -> Result<(), CodeListError> {
let metadata: Metadata = Default::default();

let mut expected_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None);
let mut expected_codelist = CodeList::new(
"test_codelist".to_string(),
CodeListType::ICD10,
metadata.clone(),
None,
)?;
expected_codelist.add_entry(
"B01".to_string(),
Some("Varicella [chickenpox]".to_string()),
None,
)?;

let mut observed_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None);
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?;

observed_codelist.add_entry(
"B01".to_string(),
Expand All @@ -1062,16 +1082,20 @@ mod tests {
fn test_truncate_to_3_digits_icd10_4_digits_first() -> Result<(), CodeListError> {
let metadata: Metadata = Default::default();

let mut expected_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None);
let mut expected_codelist = CodeList::new(
"test_codelist".to_string(),
CodeListType::ICD10,
metadata.clone(),
None,
)?;
expected_codelist.add_entry(
"B01".to_string(),
Some("Varicella pneumonia".to_string()),
Some("B012 truncated to 3 digits, term first encountered".to_string()),
)?;

let mut observed_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None);
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?;

observed_codelist.add_entry(
"B012".to_string(),
Expand All @@ -1090,16 +1114,20 @@ mod tests {
fn test_truncate_to_3_digits_3_and_4_digits_first() -> Result<(), CodeListError> {
let metadata: Metadata = Default::default();

let mut expected_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None);
let mut expected_codelist = CodeList::new(
"test_codelist".to_string(),
CodeListType::ICD10,
metadata.clone(),
None,
)?;
expected_codelist.add_entry(
"B01".to_string(),
Some("Varicella [chickenpox]".to_string()),
None,
)?;

let mut observed_codelist =
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None);
CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?;

observed_codelist.add_entry(
"B01".to_string(),
Expand All @@ -1126,7 +1154,7 @@ mod tests {
CodeListType::ICD10,
Default::default(),
None,
);
)?;
expected_codelist.add_entry("A10".to_string(), Some("Cholera".to_string()), None)?;

expected_codelist.add_entry(
Expand Down Expand Up @@ -1161,7 +1189,7 @@ mod tests {
CodeListType::ICD10,
Default::default(),
None,
);
)?;
expected_codelist.add_entry("A10".to_string(), Some("Cholera".to_string()), None)?;

expected_codelist.add_entry(
Expand Down Expand Up @@ -1196,7 +1224,7 @@ mod tests {
CodeListType::SNOMED,
Default::default(),
None,
);
)?;

// A SNOMED list is not x_appendable
assert!(snomed_codelist.add_x_codes().is_err());
Expand Down
12 changes: 6 additions & 6 deletions rust/codelist-rs/src/codelist_factory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ impl CodeListFactory {
self.codelist_type.clone(),
self.metadata.clone(),
Some(self.codelist_options.clone()),
);
)?;

let code_column: Vec<_> = headers
.iter()
Expand Down Expand Up @@ -179,7 +179,7 @@ impl CodeListFactory {
self.codelist_type.clone(),
self.metadata.clone(),
Some(self.codelist_options.clone()),
);
)?;

let file = std::fs::File::open(file_path)?;
let reader = std::io::BufReader::new(file);
Expand Down Expand Up @@ -431,13 +431,13 @@ mod tests {
CodeListType::ICD10,
factory.metadata.clone(),
Some(factory.codelist_options.clone()),
);
)?;
let codelist2 = CodeList::new(
"test_codelist2".to_string(),
CodeListType::ICD10,
factory.metadata.clone(),
Some(factory.codelist_options.clone()),
);
)?;
let codelists = factory.load_codelists(Some(vec![codelist1, codelist2]), None)?;
Ok(codelists)
}
Expand Down Expand Up @@ -1001,13 +1001,13 @@ B02,Test Disease 2,Description 2";
CodeListType::ICD10,
factory.metadata.clone(),
Some(factory.codelist_options.clone()),
);
)?;
let codelist2 = CodeList::new(
"test_codelist2".to_string(),
CodeListType::ICD10,
factory.metadata.clone(),
Some(factory.codelist_options.clone()),
);
)?;
let codelists = factory.load_codelists(Some(vec![codelist1, codelist2]), None)?;

// load codelists from folder
Expand Down
5 changes: 4 additions & 1 deletion rust/codelist-rs/src/codelist_options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ pub struct CodeListOptions {
pub code_column_name: String, // for csv files
pub term_column_name: String, // for csv files
pub code_field_name: String, // for json files
pub term_field_name: String, // for json files
pub term_field_name: String,
pub custom_regex: Option<String>, // for custom validation
}

impl Default for CodeListOptions {
Expand All @@ -29,6 +30,7 @@ impl Default for CodeListOptions {
term_column_name: "term".to_string(),
code_field_name: "code".to_string(),
term_field_name: "term".to_string(),
custom_regex: None,
}
}
}
Expand All @@ -45,5 +47,6 @@ mod tests {
assert_eq!(options.term_column_name, "term");
assert_eq!(options.code_field_name, "code");
assert_eq!(options.term_field_name, "term");
assert_eq!(options.custom_regex, None);
}
}
5 changes: 5 additions & 0 deletions rust/codelist-rs/src/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
use std::io;

use csv;
use regex;
use serde_json;

/// Enum to represent the different types of errors that can occur in the
Expand Down Expand Up @@ -138,4 +139,8 @@ pub enum CodeListError {

#[error("{codelist_type} cannot be transformed by having X added to the end of it")]
CodeListNotXAddable { codelist_type: String },

#[error("Invalid custom regex pattern: {0}")]
#[construct(skip)]
InvalidRegexPattern(#[from] regex::Error),
}
3 changes: 2 additions & 1 deletion rust/codelist-validator-rs/src/ctv3_validator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ mod tests {
CodeListType::CTV3,
Metadata::default(),
None,
);
)?;

Ok(codelist)
}

Expand Down
Loading
Loading