-
Notifications
You must be signed in to change notification settings - Fork 3
Add custom validation #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
15ad24f
4a7452c
831088d
44c94d3
097bc17
3b2bd5f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,7 @@ use std::{ | |
| }; | ||
|
|
||
| use csv::Writer; | ||
| use regex::Regex; | ||
| use serde::{Deserialize, Serialize}; | ||
|
|
||
| // Internal imports | ||
|
|
@@ -52,15 +53,22 @@ impl CodeList { | |
| codelist_type: CodeListType, | ||
| metadata: Metadata, | ||
| options: Option<CodeListOptions>, | ||
| ) -> Self { | ||
| CodeList { | ||
| ) -> Result<Self, CodeListError> { | ||
| let options = options.unwrap_or_default(); | ||
|
|
||
| // Validate custom regex if it has been set | ||
| if let Some(regex_str) = &options.custom_regex { | ||
| Regex::new(regex_str)?; | ||
| } | ||
|
|
||
| Ok(CodeList { | ||
|
Comment on lines
+56
to
+64
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure about this. Be interested to get @oylenshpeegul opinion on this as well. If we go back to the user experience, i think that wil be guide us. I think in python we want to be able to do the following: # Create a new codelist object
c = CodeList("Pneumonia", "ICD10", "Manually created")
# Add a code to the list
c.add_entry("A119", "pneumonia")
# validate the codelist
c.validate_codes() # validates ICD
# validate custom
c.validate_codes(regex_str="^[A-Z][0-9]{2}\.[0-9]$")Users would need to use regex in python so we could set if it accepts a string or a regex expression. Let's maybe start with a regex expression and see what that might mean for typing. I think we want something like a match statement for using the regex or not. pub trait Validator {
fn validate_codes(&self) -> Result<(), CodeListValidatorError>;
}
impl Validator for CodeList {
fn validate_codes(&self, custom_regex: Option<&Regex>) -> Result<(), CodeListValidatorError> {
match custom_regex {
Some(regex) => {...actual implementation},
None => {
match self.codelist_type {
CodeListType::ICD10 => IcdValidator(self).validate_all_code(),
CodeListType::SNOMED => SnomedValidator(self).validate_all_code(),
CodeListType::OPCS => OpcsValidator(self).validate_all_code(),
CodeListType::CTV3 => Ctv3Validator(self).validate_all_code(),
}
}
}
}
}With this in mind, putting the regex into What do you think?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point! If we want to serialize I don't think I understand the Python part of your comment. Does PyO3 even translate regexes between Python and Rust? Python regexes are the fancy backtracking kind, whereas in Rust they are the more restricted finite automata kind.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking was the regex could be included in the codelistoptions which can be customisable, and would link the regex to the particular codelist. But realise the way I've implemented it means if the regex is set, you can only validate with the regex and not normal validation which depends on the codelist type. Using it as a parameter is a lot more flexible, but the regex but will not be 'saved' in the codelist struct. But I guess validation notes can be added to the metadata to document if different validation methods have been used. I think with the pyO3 the python can just pass a string and under the hood the rust can convert to regex within the method. Or maybe can just pass a regex in the python - not exactly sure how that would work. But to be honest think Caroline's method is a lot simpler and I probably overcomplicated this!
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is codelistoptions something we want to expose to python api so it can be altered by user? Maybe I'm misunderstanding the purpose of these options 🤔
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll have a go at implementing the other approach in a separate PR then we can see them both
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will want to expose CodeslistOptions at some point in the Python API i think. |
||
| name, | ||
| entries: BTreeMap::new(), | ||
| codelist_type, | ||
| metadata, | ||
| logs: Vec::new(), | ||
| codelist_options: options.unwrap_or_default(), | ||
| } | ||
| codelist_options: options, | ||
| }) | ||
| } | ||
|
|
||
| /// Get the type of the codelist | ||
|
|
@@ -520,7 +528,7 @@ mod tests { | |
| CodeListType::ICD10, | ||
| Metadata::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
| codelist.add_entry("R65.2".to_string(), None, None)?; | ||
|
|
||
| codelist.add_entry( | ||
|
|
@@ -581,20 +589,22 @@ mod tests { | |
| term_column_name: "test_term".to_string(), | ||
| code_field_name: "test_code".to_string(), | ||
| term_field_name: "test_term".to_string(), | ||
| custom_regex: None, | ||
| }; | ||
|
|
||
| let codelist = CodeList::new( | ||
| "test_codelist".to_string(), | ||
| CodeListType::ICD10, | ||
| Default::default(), | ||
| Some(codelist_options), | ||
| ); | ||
| )?; | ||
|
|
||
| assert!(codelist.codelist_options.allow_duplicates); | ||
| assert_eq!(codelist.codelist_options.code_field_name, "test_code".to_string()); | ||
| assert_eq!(codelist.codelist_options.term_field_name, "test_term".to_string()); | ||
| assert_eq!(codelist.codelist_options.code_column_name, "test_code".to_string()); | ||
| assert_eq!(codelist.codelist_options.term_column_name, "test_term".to_string()); | ||
| assert_eq!(codelist.codelist_options.custom_regex, None); | ||
|
|
||
| assert_eq!(codelist.metadata().provenance.source, Source::ManuallyCreated); | ||
| let time_difference = get_time_difference(codelist.metadata().provenance.created_date); | ||
|
|
@@ -632,7 +642,7 @@ mod tests { | |
| CodeListType::ICD10, | ||
| Default::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
| codelist.add_entry("R65.2".to_string(), Some("Severe sepsis".to_string()), None)?; | ||
| codelist.add_entry("R65.2".to_string(), Some("Severe sepsis".to_string()), None)?; | ||
|
|
||
|
|
@@ -807,12 +817,14 @@ mod tests { | |
| } | ||
|
|
||
| #[test] | ||
| fn test_get_metadata() { | ||
| fn test_get_metadata() -> Result<(), CodeListError> { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch |
||
| let metadata: Metadata = Default::default(); | ||
| let codelist = | ||
| CodeList::new("test".to_string(), CodeListType::ICD10, metadata.clone(), None); | ||
| CodeList::new("test".to_string(), CodeListType::ICD10, metadata.clone(), None)?; | ||
|
|
||
| assert_eq!(codelist.metadata(), &metadata); | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[test] | ||
|
|
@@ -989,7 +1001,7 @@ mod tests { | |
| CodeListType::SNOMED, | ||
| Default::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
|
|
||
| // A SNOMED list is not truncatable | ||
| assert!(snomed_codelist.truncate_to_3_digits(TermManagement::First).is_err()); | ||
|
|
@@ -1001,16 +1013,20 @@ mod tests { | |
| fn test_truncate_to_3_digits_icd10_4_digits_drop_term() -> Result<(), CodeListError> { | ||
| let metadata: Metadata = Default::default(); | ||
|
|
||
| let mut expected_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None); | ||
| let mut expected_codelist = CodeList::new( | ||
| "test_codelist".to_string(), | ||
| CodeListType::ICD10, | ||
| metadata.clone(), | ||
| None, | ||
| )?; | ||
| expected_codelist.add_entry( | ||
| "B01".to_string(), | ||
| None, | ||
| Some("Truncated to 3 digits, term discarded".to_string()), | ||
| )?; | ||
|
|
||
| let mut observed_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None); | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?; | ||
|
|
||
| observed_codelist.add_entry( | ||
| "B012".to_string(), | ||
|
|
@@ -1029,16 +1045,20 @@ mod tests { | |
| fn test_truncate_to_3_digits_3_and_4_digits_drop_term() -> Result<(), CodeListError> { | ||
| let metadata: Metadata = Default::default(); | ||
|
|
||
| let mut expected_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None); | ||
| let mut expected_codelist = CodeList::new( | ||
| "test_codelist".to_string(), | ||
| CodeListType::ICD10, | ||
| metadata.clone(), | ||
| None, | ||
| )?; | ||
| expected_codelist.add_entry( | ||
| "B01".to_string(), | ||
| Some("Varicella [chickenpox]".to_string()), | ||
| None, | ||
| )?; | ||
|
|
||
| let mut observed_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None); | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?; | ||
|
|
||
| observed_codelist.add_entry( | ||
| "B01".to_string(), | ||
|
|
@@ -1062,16 +1082,20 @@ mod tests { | |
| fn test_truncate_to_3_digits_icd10_4_digits_first() -> Result<(), CodeListError> { | ||
| let metadata: Metadata = Default::default(); | ||
|
|
||
| let mut expected_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None); | ||
| let mut expected_codelist = CodeList::new( | ||
| "test_codelist".to_string(), | ||
| CodeListType::ICD10, | ||
| metadata.clone(), | ||
| None, | ||
| )?; | ||
| expected_codelist.add_entry( | ||
| "B01".to_string(), | ||
| Some("Varicella pneumonia".to_string()), | ||
| Some("B012 truncated to 3 digits, term first encountered".to_string()), | ||
| )?; | ||
|
|
||
| let mut observed_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None); | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?; | ||
|
|
||
| observed_codelist.add_entry( | ||
| "B012".to_string(), | ||
|
|
@@ -1090,16 +1114,20 @@ mod tests { | |
| fn test_truncate_to_3_digits_3_and_4_digits_first() -> Result<(), CodeListError> { | ||
| let metadata: Metadata = Default::default(); | ||
|
|
||
| let mut expected_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata.clone(), None); | ||
| let mut expected_codelist = CodeList::new( | ||
| "test_codelist".to_string(), | ||
| CodeListType::ICD10, | ||
| metadata.clone(), | ||
| None, | ||
| )?; | ||
| expected_codelist.add_entry( | ||
| "B01".to_string(), | ||
| Some("Varicella [chickenpox]".to_string()), | ||
| None, | ||
| )?; | ||
|
|
||
| let mut observed_codelist = | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None); | ||
| CodeList::new("test_codelist".to_string(), CodeListType::ICD10, metadata, None)?; | ||
|
|
||
| observed_codelist.add_entry( | ||
| "B01".to_string(), | ||
|
|
@@ -1126,7 +1154,7 @@ mod tests { | |
| CodeListType::ICD10, | ||
| Default::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
| expected_codelist.add_entry("A10".to_string(), Some("Cholera".to_string()), None)?; | ||
|
|
||
| expected_codelist.add_entry( | ||
|
|
@@ -1161,7 +1189,7 @@ mod tests { | |
| CodeListType::ICD10, | ||
| Default::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
| expected_codelist.add_entry("A10".to_string(), Some("Cholera".to_string()), None)?; | ||
|
|
||
| expected_codelist.add_entry( | ||
|
|
@@ -1196,7 +1224,7 @@ mod tests { | |
| CodeListType::SNOMED, | ||
| Default::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
|
|
||
| // A SNOMED list is not x_appendable | ||
| assert!(snomed_codelist.add_x_codes().is_err()); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,7 +80,8 @@ mod tests { | |
| CodeListType::CTV3, | ||
| Metadata::default(), | ||
| None, | ||
| ); | ||
| )?; | ||
|
|
||
| Ok(codelist) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this in.