Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify in CVS file wrongly printed questionnaires #265

Open
Miguel-Hermanns opened this issue Jan 18, 2024 · 8 comments
Open

Identify in CVS file wrongly printed questionnaires #265

Miguel-Hermanns opened this issue Jan 18, 2024 · 8 comments

Comments

@Miguel-Hermanns
Copy link

Dear all,

while processing a batch of questionnaires I have identified several ones that were wrongly printed. That is, double-sided questionnaires that only had one face, either the front one or the back one, because the printer took two sheets of paper together. And the person filling out the questionnaire did not care about it.

I'm interested in incorporating into my post processing a way of spotting this issue. For it, I have looked at the error codes exported in CVS and with them it is not possible to discern between this issue and, for instance, a blank sheet of paper present in the scanned pages. Or is there a way?

Thanks in advance and best regards,

Miguel

@benzea
Copy link
Member

benzea commented Jan 18, 2024

Uh, good question, I fear I need to think about that a bit …

I suppose the "reorder" code could deal with it somehow. But I suspect we need some "blank page" detection, as the code right now assumes that front/back of a page always match. It could be as simple as "no corner marks and no barcode present".

Do you have a unique questionnaire ID printed? Without that reorder would probably be useless …

@Miguel-Hermanns
Copy link
Author

I have never tried the reorder command as all my questionnaires are single double-sided pages. And I don't use unique questionnaire IDs either 😕.

It would be useful to distinguish foreign pages, as I call them, from questionnaires with these kind of issues. Maybe it could be another option in the flag that marks if there is an issue with a given questionnaire (double mark, etc.). Or that the flag used to mark incomplete multi-page questionnaires also applies to incomplete single-page questionnaires, or multi-page questionnaires with missing sides in one of the pages.

Thanks and greetings from Madrid,

Miguel

@benzea
Copy link
Member

benzea commented Jan 18, 2024

One problem is that SDAPS already sorts the pages into questionnaires at the start when adding the images. And that happens without even running an initial identification step (I guess you could call that a neat design flaw).

I think you are right that you are not getting any information in the CSV (other than nothing being checked on the questions of that page). The only quick idea I have would be a guess based on whether the page has a valid correction matrix. Not sure how to put that into the CSV file though; images is a list (which I guess might not be exactly of length 2 in theory).

It is a bit of a mess :-/

Greetings from Munich (though I was in Madrid not long ago).

@Miguel-Hermanns
Copy link
Author

But the initial step of sorting pages into questionnaires is not a problem, or? You are afterwards checking if it is a questionnaire or not, if the marks are correct or not, etc. It would be in that check step where you would look for the presence of all pages/faces of the questionnaire. The good thing of the issue I'm looking at is that you have always both faces of the questionnaire scanned and present in the pdf document being imported. It is just that one of the two faces is not as expected.

@benzea
Copy link
Member

benzea commented Jan 19, 2024

It isn't a fundamental problem, no. More of a question where to hook it in conceptually.

I am thinking:

  1. Set an per-page attribute during "recognize -i" that the page is empty (-i as identification should be sufficient here)
  2. Implement a simple logic to merge two consecutive sheets into one (similar to "reorder", but much simpler). Maybe just a "reorder" special case if there is no questionnaire ID (i.e. assume consecutive pages belong together).

For 1., I guess we could think up various heuristics (that only kick in if there is no barcode):

  • Mark page if it is almost blank (this would work well for you)
  • Mark page if corner marks were not found. To be versatile against bad scans, I think it might make sense to assume a page is a bad scan if we found 2 of the corner marks (4 corner marks, 3 can be recognized, 2 -> assume bad scan, 1 or 0 -> assume blank)

Note that obviously you need to make sure that the affected questionnaires are on consecutive scanned pages. But I guess you can assume that …

@benzea
Copy link
Member

benzea commented Jan 19, 2024

btw. how urgent is this for you?

@Miguel-Hermanns
Copy link
Author

No urgent at all. I had the issue with 5 out of 2100 questionnaires and was able to post process them by marking them as valid questionnaires using the GUI and introducing by hand the marked answers. And I doubt similar issues will arise in the future as the printing shop at our university is informed and trying to figure out what went wrong during the printing of the questionnaires (it is the first time I encounter this issue and I'm using sdaps twice a year since 2020).

Regarding your brainstorming, maybe a more robust solution could be to mark just as "identified questionnaire with issues" the kind of questionnaires we are talking about. Let's say that you consider a valid flawless questionnaire one in which you are able to identify the existence of all pages/faces of pages in the scanned document, of course in consecutive pages of the pdf as the two faces of a sheet will always appear consecutively. The key thing, I think, is to identify the QR codes from the front and back faces of each page. That let's ensure the integrity of the questionnaire, or the page of the questionnaire.

@Miguel-Hermanns
Copy link
Author

By the way. If you are going to think on this in the coming days, maybe you can also think on the application of sdaps I mentioned in issue #239, as one of the solutions we were discussing at that time was to create the different versions as part of the same survey 🙄. But then all questionnaires would be flagged as wrong/incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants