-
Notifications
You must be signed in to change notification settings - Fork 17
Ensure idempotence of model evaluations #3186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c87746b to
3149c5b
Compare
cd85127 to
0483d08
Compare
|
I tested this using an optimising (large tokamak regression test) and non-optimising (once-through at LT solution vector) case. When non-optimising, debug logs give: After the 4th model evaluation the mfiles converge, 5 evaluations are required to check this. In the optimising case: Varying numbers of re-evaluations are required to gain idempotency when solving: this only requires the objective function and constraints to converge. Once a solution has been found however, it is immediately found to be idempotent, possibly because the solver steps have become so small. SummaryI believe this is now working, and should fix the problem. The main contentious implementation detail is the re-directing of OUT.DAT and MFILE.DAT output to temporary IDEM_OUT.DAT and IDEM_MFILE.DAT files for the purposes of checking idempotence. This was done because the OUT.DAT and MFILE.DAT writing is currently intertwined in Fortran, and difficult to easily separate due to being written all over the place. The scan functionality also needs to be preserved. But mainly the imminent conversion to Python trumps a thorough Fortran refactor here; indeed this could then be done without file IO at all. LT regression differencesThis has caused some differences to the large tokamak solution. Of particular note: |
0483d08 to
20403d2
Compare
|
In large_tokamak,
The CS fatigue constraint does not seem to be active, so I don't think the change in |
|
Can you list which variables are changing most between successive evaluations? |
timothy-nunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor quality comments. I think if we are going to make such a change it needs to be very clear where files are being written.
We have discussed the possibility of adding a user-defined iteration limit (rather than hard coded 10). Is this something we still want to do?
process/evaluators.py
Outdated
| objf, conf = self.caller.call_models(xv, m) | ||
|
|
||
| # Convergence loop to ensure burn time consistency | ||
| # TODO This can be removed once model evaluations become effectively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not the case now? Can't we remove this consistency check now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
process/caller.py
Outdated
| # TODO The only way to ensure idempotence in all outputs is by comparing | ||
| # mfiles at this stage | ||
| previous_mfile_arr = None | ||
| file_prefix = ft.global_variables.fileprefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added explanatory comment.
process/caller.py
Outdated
|
|
||
| # Extract mfile data | ||
| mfile_path = ( | ||
| f2py_compatible_to_string(file_prefix).split("IN.DAT")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would ft.global_variables.output_prefix work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've changed this.
process/caller.py
Outdated
| "constraints." | ||
| ) | ||
|
|
||
| def call_models_full_idempotence(self, xc: np.ndarray, ifail: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also writes the mfiles, I think the name and description should make this very clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; changed accordingly.
|
@mkovari I've sent you the non-idempotent and idempotent OUT.DATs and MFILE.DATs for the large tokamak case. I've also shared the presentation that goes into detail about the individual variable changes between iterations. |
After investigation this change is completely legitimate. When the initial "radial" crack size is more than half the radial thickness of the conduit, the fatigue life is zero. By chance it is just under half in the "idempotent" version, and just over in the "non-idempotent" version. For discussion of how to ensure sufficient fatigue life, see #3191. |
|
Many of the variables that change the most when the models are re-evaluated are related to the CS, the burn time and the flux swing. We already know that this calculation seems rather circular and very hard to understand. If anyone is feeling clever they should attempt to rewrite this - my brain is too small at the moment. This problem should not stop this pull request from going ahead. |
Thanks @timothy-nunn. Hopefully I've made it clear now where the output files are being written, and whether they're intermediary or final outputs. Regarding the idea of the user-defined model evaluation limit, I think this was mainly a concern in the case of huge slowdowns or Process; by splitting up the the re-evaluations into "solver-required idempotence only" vs. "full final mfile idempotence", the number of total evaluations is purely necessary to get correct results; any fewer and there's a real risk of the solver taking the wrong path or the output being incorrect. I'm not sure there's much need currently to raise it beyond 10 evaluations; all runs so far indicate idempotence is achieved in 5 evaluations at most. It could be required in future however (for a particular input file/future model modification), but I'm inclined to include it when/if it's required. I'd appreciate a second review! |
timothy-nunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another couple of QOL changes. Also, can you update this branch with main so I can check this doesn't cause the new regression tests to not converge (it will probably fail the ST test due to a NaN, but all others should converge idempotently).
source/fortran/init_module.f90
Outdated
| end subroutine open_idempotence_files | ||
|
|
||
| subroutine close_idempotence_files | ||
| ! Revert back to writing to original OUT.DAT and MFILE.DAT, after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you're saying and doing, but it might be worth being a bit more explicit:
- Closing the intermediate idempotence-check files, deleting them in the process
- Re-opening the actual output files ready to write the final data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made more explicit.
process/caller.py
Outdated
| logger.debug("Mfiles not idempotent, evaluating models again") | ||
| previous_mfile_arr = np.copy(current_mfile_arr) | ||
|
|
||
| raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error, or an exception raised when calling the models before final idempotence will cause the idempotence check files to be left in the filesystem. Are they useful? Or does an except block need to remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: they might be useful, but most of the time they'd be an annoyance. I've added an except block.
28f33df to
c04b289
Compare
|
Thanks @timothy-nunn, I think I've responded to your comments. I've rebased to allow you to check the regression tests. |
|
Thanks @jonmaddock. The @mkovari @jmorris-uk any thoughts? |
|
Can you send or post the two OUT.DAT files?
All the best,
Michael
…________________________________
From: Timothy ***@***.***>
Sent: Monday, June 3, 2024 09:46
To: ukaea/PROCESS ***@***.***>
Cc: Kovari, Michael D ***@***.***>; Mention ***@***.***>
Subject: Re: [ukaea/PROCESS] Ensure idempotence of model evaluations (PR #3186)
Thanks @jonmaddock<https://github.com/jonmaddock>.
The large_tokamak_once_through file appears to have some large changes.
WARNING test_process_input_files:test_process_input_files.py:110 Variable Ref New % Change
WARNING test_process_input_files:test_process_input_files.py:113 ------------------------------------------------------------
WARNING test_process_input_files:test_process_input_files.py:115 vsbrn 0.382 279 72913.33
WARNING test_process_input_files:test_process_input_files.py:115 pfwpmw 0.0683 1.24 1709.38
WARNING test_process_input_files:test_process_input_files.py:115 tcycle 2.63e+03 9.92e+03 277.05
WARNING test_process_input_files:test_process_input_files.py:115 coe 161 505 214.61
WARNING test_process_input_files:test_process_input_files.py:115 ineq_con016 -0.134 0.0132 109.85
WARNING test_process_input_files:test_process_input_files.py:115 vsstt 279 558 99.65
WARNING test_process_input_files:test_process_input_files.py:115 pnetelmw 346 405 16.99
WARNING test_process_input_files:test_process_input_files.py:115 pnetelmw. 346 405 16.99
WARNING test_process_input_files:test_process_input_files.py:115 c2253 14.6 17.1 16.99
WARNING test_process_input_files:test_process_input_files.py:115 pnetelmw/powfmw 21.2 24.9 16.99
WARNING test_process_input_files:test_process_input_files.py:115 pnetelmw/(powfmw+emultmw 17.9 20.9 16.99
WARNING test_process_input_files:test_process_input_files.py:115 c242 8.64 7.95 -8.01
WARNING test_process_input_files:test_process_input_files.py:115 c243 7.08 6.48 -8.52
WARNING test_process_input_files:test_process_input_files.py:115 pacpmw 607 547 -9.90
WARNING test_process_input_files:test_process_input_files.py:115 cirpowfr 0.603 0.536 -11.17
WARNING test_process_input_files:test_process_input_files.py:115 cppa 34.6 28.8 -16.64
WARNING test_process_input_files:test_process_input_files.py:115 c2262 34.6 28.8 -16.64
WARNING test_process_input_files:test_process_input_files.py:115 psechtmw 285 226 -20.64
WARNING test_process_input_files:test_process_input_files.py:115 c226 485 330 -32.11
WARNING test_process_input_files:test_process_input_files.py:115 c2174 11.6 7.53 -35.03
WARNING test_process_input_files:test_process_input_files.py:115 cryv 2.52e+04 1.64e+04 -35.03
WARNING test_process_input_files:test_process_input_files.py:115 c2263 342 192 -43.89
WARNING test_process_input_files:test_process_input_files.py:115 crymw 103 43.7 -57.79
WARNING test_process_input_files:test_process_input_files.py:115 crypmw 103 43.7 -57.79
WARNING test_process_input_files:test_process_input_files.py:115 helpow_+_helpow_cryal/1.0d6 0.21 0.0885 -57.79
WARNING test_process_input_files:test_process_input_files.py:115 qmisc/1.0d6 0.0651 0.0275 -57.79
WARNING test_process_input_files:test_process_input_files.py:115 bktcycles 5.87e+04 1.56e+04 -73.48
WARNING test_process_input_files:test_process_input_files.py:115 qac/1.0d6 0.0874 0.00381 -95.65
WARNING test_process_input_files:test_process_input_files.py:115 peakpoloidalpower 9.9e+03 180 -98.18
@mkovari<https://github.com/mkovari> @jmorris-uk<https://github.com/jmorris-uk> any thoughts?
—
Reply to this email directly, view it on GitHub<#3186 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACKEVEGPVRUDR7L7WVY4CBTZFQUQHAVCNFSM6AAAAABH37BT5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBUGYZTIMZVGI>.
You are receiving this because you were mentioned.
|
|
Well this one is weird. The old version doesn't even add up correctly: Part of the problem is that some of the times are calculated in Line 905 in dfcda59
etc. Anyway, all this just confirms that for the time being we need this revision. |
Ensure values in mfiles converge before finally writing them.
c04b289 to
f1eefa5
Compare
|
@timothy-nunn I think this is ready for a final review now I've resolved the conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonmaddock, all looking good now. I'll approve and leave you to merge at your leisure.
Remove superfluous else statements Fix type hints in caller.py Prevent circular import of process.main. Use output file prefix for idempotence mfile checks Improve naming and docstrings
Improve docstring in close_idempotence_files()
Improve type check in caller.py
f1eefa5 to
e4b35b8
Compare
To try to ensure idempotency in the result of the model evaluations whilst being efficient, the idea is to:
This will at least double model evaluations (and hence code runtime), as a comparison of previous and current results for the same optimisation parameter vector is required in each case.