Read any type of "structured" data and output in a tabular format.
- Organizations who need to transform client reports for input into an application database.
- Organizations who are migrating from spreadsheets and the like into an application. (e.g. sales data into a CRM)
When this is done manually someone will look over the data (commonly in Excel) and start performing manipulations to create a standard tabular layout. Depending on the input data this can be quite laborious. I'm proposing using statistical techniques to determine header information and pattern recognition to classify the data.
Consider a jigsaw puzzle, it's possible you won't get it right but it will be obvious that you're wrong. Also, as more pieces are put into place the puzzle will become easier. The assumption is while the input data is not structured for inputting into database, there is structure to it and rows or sections of cells will fit together analogous to a puzzle.
Have you ever accidently sorted your headers into your data in Excel? Besides not having headers, it's pretty obvious that the headers don't fit with the rest of the data. Even in a dataset that has duplicated headers we can determine which data doesn't fit by extracting features that differentiate the header/s from the rest of the data.