Comparison of popular Excel libraries
21 June, 2017
Simple benchmark of popular library
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
| TEST_NAME | NUM_ROWS | NUM_COLS | TIME_IN_SECONDS | |-----------------------------|----------|----------|-----------------| | pyexcelerate value fastest | 1000 | 100 | 0.47 | | pyexcelerate value faster | 1000 | 100 | 0.51 | | pyexcelerate value fast | 1000 | 100 | 1.53 | | xlsxwriter value | 1000 | 100 | 0.84 | | openpyxl | 1000 | 100 | 2.74 | | pyexcelerate style cheating | 1000 | 100 | 1.23 | | pyexcelerate style fastest | 1000 | 100 | 2.40 | | pyexcelerate style faster | 1000 | 100 | 2.75 | | pyexcelerate style fast | 1000 | 100 | 6.15 | | xlsxwriter style cheating | 1000 | 100 | 1.21 | | xlsxwriter style | 1000 | 100 | 4.85 | | openpyxl | 1000 | 100 | 6.32 | |
Since the data size accessing is relatively large, approx. >50 excel files, >10 sheets in each file and >1k rows in each sheet, performance become very important.
OpenPyXL, LibXL can provide both write & read functions
xlrd for reading only
xlwt, PyExcelerate for writing only
OpenPyXL = Free with relatively slow performance
xlwt = Free with no limit but 2007 Excel (XML format) .xlsx is not supported
LibXL (C++) = extremely fast performance but with 300 cells read limit for free trial
xlrd = Free with no limit, support xlsx reading and fast performance
PyExcelerate = Free with no limit, support xlsx writing and fast performance
After the implementation of all libraries, I have decided to read with xlrd and write with PyExcelerate, the development was smooth and clean, no collision occurs between these two libraries, however, PyExceleratebehaves slightly different comparing to xlrd.
1. For example, accessing cell value A4, it is (3, 0) for xlrd but (4, 1) for PyExcelerate as xlrd is zero-based, corruption occurs when trying to write to col#0 or row#0.
2. Reading error value like “#VALUE!” is relatively easy in OpenPyXL as it’s directly stored as string”#VALUE!”, however, it is stored as “15” for xlrd in my example.