Good Habits
The long partition is built for work that can survive interruption. If you use it, assume the job may be paused and resumed later.
Practical habits help a lot here:
- write output incrementally
- checkpoint often enough to recover cleanly
- test restart logic before you launch a long run
- avoid requesting more resources than the job really needs
For small sanity checks, use debug. For normal production work, use normal. Save long for jobs that genuinely benefit from restartability.
Before you scale up, it is worth doing one small run first:
- Test on
debug. - Confirm the script works.
- Confirm output paths are correct.
- Confirm the resource request is reasonable.
- Move to
normalorlongonce the job behaves the way you want.