Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spreadsheets with workbook name other than /xl/workbook.xml in _rels/.rels cannot be opened #254

Closed
janhec opened this issue Jul 5, 2024 · 13 comments
Assignees
Labels
enhancement New feature or request resolved This issue has been resolved.

Comments

@janhec
Copy link

janhec commented Jul 5, 2024

workbook.xml is hardcoded in xldocument.cpp, but is actually variable.
I do not encounter this problem when (re)saving a file in excel (or libreoffice calc), the workbooks in question get outputted by a program that I need in my (customers) workflow.
Opening and saving in excel or libreoffice will cure this, but this is not a welcome addition to the (automated) workflow.
In these cases, the name will be /xl/workbook22.xml, which is not repeated in [Content_Types].xml (in contrast to workbook.xml in the ordinary case).
Some (copilot) prodding got me:
Variability in Workbook Paths:
You mentioned that sometimes the workbook path is workbook22.xml.
The specific name (e.g., workbook22.xml) can vary based on the workbook’s history, edits, and other factors.
Excel assigns unique names to different versions of the workbook, especially when you make changes or save multiple copies.
Why Different Names?:
The variation in workbook names allows Excel to manage different versions, track changes, and handle concurrent editing (e.g., when collaborating with others).
Each time you save the workbook, Excel may increment the number or use a different identifier to avoid overwriting existing files.

The last line seems a dumb addition, but otherwise it sounds appropriate to me, even if I did not find a trigger for this behavior.
The initial bug is that workbook.xml will (obviously) not be found in m_data so an exception is thrown.
I tried feeding m_data with data (including xmlid) from _rels.rels, which will make the open succeed apparently, but then the sheets will not be found (older version of openxlsx).
Another gotcha in this case is that shared strings has x:si instead of si which gets flagged in the may 25 revision.
So I am not confident to find a solution myself and hope for any comments.

@janhec
Copy link
Author

janhec commented Jul 6, 2024

workbook22.xml contains:
<x:workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><x:fileVersion appName="SpreadsheetLight" />...views...sheets...</x:workbook>
The workbook id in .rels is not echoed in the sheets, e.g. sheetId="1" r:id="R4b78b06d10804515" vs .rels: Target="/xl/workbook22.xml" Id="R8892f0194c3849b0".
So it seems a lot of additional code may be needed to cover this case and perhaps the "SpreadsheetLight" moniker will help.

@troldal
Copy link
Owner

troldal commented Jul 7, 2024

I would say that this not a bug in OpenXLSX, but a bug in whatever produced the workbook you are trying to open. In ECMA-376 (the official standard for the Office Open XML formats) part 1, section 12.2 stipulates that the workbook data must be physically located in the /workbook.xml file.

image

So a workbook file named workbook22.xml is clearly non-standard

@janhec
Copy link
Author

janhec commented Jul 8, 2024

Thank you for the info. I admit not reading the standard. Too lazy and/or busy with other things.

There remains a nag on my mind that the standard may not exclude older standards, so we should adhere to /xl/workbook.xml when creating new (and preferably, saving old) workbooks, but still be able to open this case.

Reasons: appName="SpreadsheetLight" in /xl/workbook22.xml and the fact that there is a library for this (https://github.com/ARLM-Keller/SpreadsheetLight, presumably relevant, did not check further).
Then (again) both excel and libreoffice will open these files; after an excel save, the (new) standard is applied.
From a worksheet the problematic xmlns (x:): xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
(I could not access this and similar through a browser.)

Looking at modern standard spec (in workbook.xml), I see e.g.:
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
which seems to include a lot of versions/revisions.

I have solved my problem in a rather hacky way by changing XL_Document and XLXmlData (cpp and hpp), took me a few hours.
After making workbook.xml (and workbook.xml.rels) variable (namewise) and getting the name from .rels, putting it in m_data:
When loading (and finding nonstandard workbook*.xml), I remove the problematic xmlns (x:) and record the distinct names by xml source; when saving, I put them back (hoping for no conflicts of any kind).

I have not investigated whether this is more generally adequate and useful.
I doubt you want to incorporate such mods, but if you wish to look at them, let me know.

No offense if you close this.

@aral-matrix
Copy link
Collaborator

aral-matrix commented Jul 9, 2024

@janhec : Out of curiosity: can you summarize the exact relation of all XML files and fields that are different with respect to your workbook22.xml?

I am not entirely sure how your _rels/.rels file is looking - is the workbook22.xml referenced like so?

<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="xl/workbook22.xml"/>

The important bit I am interested in here is whether or not the exact type string is reflected here, so that the document format is at least logically consistent and the workbook file name could be looked up by checking the _rels/.rels file for the relationship entry with that type.

@janhec
Copy link
Author

janhec commented Jul 10, 2024

The .rels file looks like this:
<?xml version="1.0" encoding="utf-8"?><Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="/docProps/core.xml" Id="R1089e9da7ec94cca" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="/xl/workbook22.xml" Id="R8892f0194c3849b0" /> </Relationships>

workbook22.xml goes like this:
<?xml version="1.0" encoding="utf-8"?><x:workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><x:fileVersion appName="SpreadsheetLight" /> <x:bookViews><x:workbookView /></x:bookViews> <x:sheets> <x:sheet name="Workspace" sheetId="1" r:id="R4b78b06d10804515" /> ... more sheets ... </x:sheets> </x:workbook>
The workbook filename can be looked up.
A bit of code I added in XLDocument.cpp, open() about line 460

std::string wbpath; bool wbfound = false;
for (auto rel : m_docRelationships.relationships()) {
    if (rel.type() == XLRelationshipType::Workbook) {
        wbpath     = rel.target();
        auto xmlid = rel.id();          // trying both id's: as they say, identical
        //xmlid = m_docRelationships.relationshipByTarget(wbpath).id();
        if (wbpath[0] == '/') wbpath.erase(0, 1);
        m_data.emplace_back(/* parentDoc */ this,
                            /* xmlPath   */ wbpath,
                            /* xmlID     */ xmlid,
                            /* xmlType   */ XLContentType::Workbook);
        wbfound = true;
        break;
    }
}
if (!wbfound) wbpath      = "xl/workbook.xml";
needsxcheck               = wbpath != "xl/workbook.xml";    // consider check appName="SpreadsheetLight" in xl/workbook*.xml
const std::string wkbrels = "xl/_rels/"   wbpath.substr(3)   ".rels";
m_data.emplace_back(this, wkbrels);    //, "", XLContentType::Workbook);
m_wbkRelationships = XLRelationships(getXmlData(wkbrels));

@aral-matrix
Copy link
Collaborator

Hmm... I am a bit worried about the worksheets r:id="R4b78b06d10804515" in workbook22.xml - OpenXLSX has a bunch of code that makes assumptions about how the r:id values are of the format "rId###" where ### is strictly numerical.

Could you provide a full example of one of your xlsx files with a workbook22.xml so that I can have a look at whether it could be supported without major changes?

A file that does load without problems in libreoffice would be great :)

@janhec
Copy link
Author

janhec commented Jul 11, 2024 via email

@aral-matrix
Copy link
Collaborator

Please treat the appended xlsx file as confidential in the sense of not sharing it widely

I am not sure if I am missing something w.r.t. how github is used, but I do not see an attachment. You can reach me via my codeberg profile https://codeberg.org/lars_uffmann - email is listed there.

@janhec
Copy link
Author

janhec commented Jul 11, 2024 via email

@aral-matrix
Copy link
Collaborator

my apologies, it indeed doesn't show up to non-registered users - I don't see your email either though :D

However, my project files contain my email, e.g. this one:
https://codeberg.org/lars_uffmann/fwmqueue/src/branch/master/demo.cpp (line #5) - I just want to avoid linking it directly here with this github account, I hate microsoft with a passion :D

@aral-matrix aral-matrix self-assigned this Aug 21, 2024
@aral-matrix aral-matrix added the enhancement New feature or request label Aug 21, 2024
@aral-matrix
Copy link
Collaborator

Issue resolution is implemented in my development branch, tested by @janhec & working as expected (correct me if I missed something) without impacting general function / performance - once I get a chance to coordinate with @troldal the change will hopefully make it here.

@aral-matrix aral-matrix added the testing Functionality has been implemented in development branch and is pending a merge into main label Sep 9, 2024
@aral-matrix
Copy link
Collaborator

Development branch development-aral with the desired functionality is now available here on the repo.

@aral-matrix
Copy link
Collaborator

And the functionality is merged into the main repo. Finally able to close this :)

@aral-matrix aral-matrix added resolved This issue has been resolved. and removed testing Functionality has been implemented in development branch and is pending a merge into main labels Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request resolved This issue has been resolved.
Projects
None yet
Development

No branches or pull requests

3 participants