Use 64-bit types, to prepare for >=2GB files
Needs ReviewPublic

Authored by frsfnrrg on Jun 3 2018, 11:34 PM.

Details

Reviewers
kossebau
Summary

Okteta currently does not support large (>2GB) files, but should. This patch, using 64 bit types to refer to file memory locations and line indices, is a necessary and unavoidable step towards this goal.

[I'm not yet certain on the best remaining strategy towards >2GB; my local version of Okteta can load/modify/save large files, as I've replaced QByteArray with a LargeByteArray class with a matching API; but there are still display issues since the scroll area uses 32 bit position coordinates.)

Test Plan

Tests pass. This change only widens integer types, and hence does not introduce any new bugs
for the existing supported range of file sizes, unless the overflow behavior of 32 bit integers was relied upon somewhere.

Diff Detail

Repository
R457 Okteta
Lint
Lint Skipped
Unit
Unit Tests Skipped
frsfnrrg requested review of this revision.Jun 3 2018, 11:34 PM
frsfnrrg created this revision.
frsfnrrg retitled this revision from Use 64-bit types to Use 64-bit types, to prepare for >=2GB files.Jun 3 2018, 11:46 PM
frsfnrrg edited the summary of this revision. (Show Details)
frsfnrrg edited the test plan for this revision. (Show Details)
frsfnrrg added a reviewer: kossebau.

Hi, thanks for your contribution proposal.

I agree with you that support for memory ranges beyond 32-bit address size is painfully missing, and should be added.

Sadly your patch arrives at an unfortunate time: I have just started a big rewrite of that internal framework called "Kasten" (private repos for initial work for now). For that I had just done a big reformatting and C++11ication of the whole old codebase for a start, to have some clean and more modern base to start from. That refreshed codebase is also scheduled for release on June 11th as a version 0.25, so that this then currently supported version will be as similar to the one on the bench as possible. That version is going to be the supported version for some time, without any feature additions, while all focus is on the rewrite (planned max ETA: end of year).

So during that time I also would like to avoid any ABI-breaking change of the Okteta libraries (like your patch is doing), given their public API is recommended for use by 3rd-party, and at least KDevelop is using them.

BTW, when I looked at 64bit support some more years ago, next to the API of the rather quick'n'dirty used QByteArray there had been some other 32bit things across the Qt API which called for custom class replacements, which the lazy-in-me made delay things (myself using Okteta only for smaller document-type files). Don't remember what though, might be gone also meanwhile in Qt5. In any case will need some more checks and work first, like you hinted that you already faced.

Then I hoped to combine any work on 64bit (and beyond?) address ranges with support for loading/saving-areas-on-demand instead of requiring loading of the complete copy. Though for that still some research needs to be done when such partial-loading works (conflicts with other processes changing the same file), what APIs exist for that, what network/remote filesystems have built-in support etc, so that the code model in Okteta can cover all those things.
Of course also adding support for raw devices (to read complete filesystem images) or working memory. And how to represent such large memories to the user, possibly even where the memory could be non-continguous.

All that seems intertwined to me, so I would like to have at least some raw ideas about each puzzle stone before work on shaping each part is started.

What is your personal use-case where you would need the support for >2 GB?

When it comes to FLOSS hex editors with support for big files, have you seen https://www.wxhexeditor.org/ ? Never used it myself, but from what I read some people do use it for their needs. So perhaps you can solve your >2GB needs for now by that.

So in summary: thanks for your proposal, but right now its implications do not match the current development schedule of Okteta. Long-term though I will be happy to work with you on getting Okteta (or rather its libs) to properly support today's more common objects memory sizes.

My personal use case: I sometimes have more RAM than convenient disk space, and 1-5GB binary output files with a reasonably regular format. Okteta is a convenient tool to look at the neighborhood of an anomaly in the files (i.e. when there's an unexpected inf halfway through), and hence it is mildly annoying when 2.01 GB files can't be opened. Workarounds are trivial, but distracting.

Then I hoped to combine any work on 64bit (and beyond?) address ranges with .... so that the code model in Okteta can cover all those things.

Using mmap is actually quite orthogonal to handling large files; see this patch for example. Don't worry about the general case, because it's impracticable to test, and combinatorially difficult in the number of thing's you're trying to do. Adding one thing at a time is a much easier strategy, and avoids the problem of predicting all the unexpected constraints in advance. You don't need a map to climb a hill.

So in summary: thanks for your proposal, but right now its implications do not match the current development schedule of Okteta. Long-term though I will be happy to work with you on getting Okteta (or rather its libs) to properly support today's more common objects memory sizes

That's fine; I'll keep my local patches up to date, and hopefully revisit this next year.