Conversation
There was a problem hiding this comment.
For review purposes, don't worry about the math functions regarding splines.
Reviewers can probably start around line 293
| new_question = Question.objects.get(id=question.id) | ||
| new_question.id = None | ||
| new_question.group_id = None | ||
| new_question.save() |
There was a problem hiding this comment.
Let's replace with questions.services.common.clone_question
| for k, v in new_post.__dict__.items(): | ||
| if ( | ||
| k.startswith("_") | ||
| or k == "id" | ||
| or k == "group_of_questions_id" | ||
| or k == "conditional_id" | ||
| ): | ||
| pass | ||
| elif k == "question_id": | ||
| post_dict[k] = new_question.id | ||
| else: | ||
| post_dict[k] = v | ||
| new_post = Post(**post_dict) |
There was a problem hiding this comment.
Let's move this to the separate clone_post function:
def clone_post(post: Post, **kwargs):
...
clone_post(post, question=new_question)
| new_forecasts: list[Forecast] = [] | ||
| for forecast in original_forecasts.iterator(chunk_size=100): | ||
| forecast.id = None | ||
| forecast.pk = None | ||
| forecast.question = new_question | ||
| forecast.post = new_post | ||
| new_forecasts.append(forecast) | ||
| if new_forecasts: | ||
| Forecast.objects.bulk_create(new_forecasts, batch_size=500) |
There was a problem hiding this comment.
The issue is that you keep the entire new_forecasts list in memory, which can eat a lot of RAM.
I already have a nice util for batched updates — utils.models.ModelBatchUpdater. Maybe you can create a similar helper, but for batched creation?
Something like:
class ModelBatchCreator(ModelBatchUpdater):
def __init__(
self,
model_class: type[DjangoModelType],
batch_size: int = 100,
):
self.model_class = model_class
self.batch_size = batch_size
self._batch: list[DjangoModelType] = []
def append(self, obj: DjangoModelType) -> None:
self._batch.append(obj)
if len(self._batch) >= self.batch_size:
self.flush()
def flush(self) -> None:
if self._batch:
self.model_class.objects.bulk_create(self._batch)
self._batch.clear()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.flush()And then use it like:
# Updating posts
with ModelBatchCreator(
model_class=Forecast, batch_size=500
) as creator:
for idx, forecast in enumerate(original_forecasts.iterator(chunk_size=500)):
forecast.id = None
forecast.pk = None
forecast.question = new_question
forecast.post = new_post
creator.append(forecast)
if idx % batch_size == 0:
logger.info(f"Created {idx}/{total} forecasts")| question_to_change.scheduled_resolve_time = new_scheduled_close_time | ||
| question_to_change.save() | ||
| post = question_to_change.get_post() | ||
| assert post |
There was a problem hiding this comment.
assert won't work in python production mode, please replace with explicit exceptions raise
| new_cdf = np.cumsum(new_pmf).tolist()[:-1] | ||
| return new_cdf | ||
|
|
||
| print("rescaling forecasts...") |
There was a problem hiding this comment.
Let's replace it with logging
| for i, forecast in enumerate(forecasts.iterator(chunk_size=100), 1): | ||
| print(i, "/", c, end="\r") | ||
| forecast.continuous_cdf = transform_cdf(forecast.continuous_cdf) | ||
| forecast.distribution_input = None | ||
| updated_forecasts.append(forecast) | ||
| print() | ||
| print("Done") | ||
| if updated_forecasts: | ||
| print("Saving forecasts...", end="\r") | ||
| with transaction.atomic(): | ||
| Forecast.objects.bulk_update( | ||
| updated_forecasts, | ||
| fields=["continuous_cdf", "distribution_input"], | ||
| batch_size=500, | ||
| ) | ||
| print("Saving forecasts... DONE") |
There was a problem hiding this comment.
Same here -- let's use ModelBatchUpdater
| if question.type not in [ | ||
| Question.QuestionType.NUMERIC, | ||
| Question.QuestionType.DISCRETE, | ||
| Question.QuestionType.DATE, | ||
| ]: |
There was a problem hiding this comment.
if question.type not in QUESTION_CONTINUOUS_TYPES
…reshape-continuous-command
closes #3707
adds command:
python manage.py reshape_continuous_questionparams:
This command will be rarely (hopefully essentially never) used and the code doesn't need to be polished. But it should be checked over for any logic faults.