Using Excel files from Python
Intro
The other day I had to create a report in Excel using Python for work.
Python has no shortage of libraries to create Excel files. There are portable solutions that would work on Windows and Linux, as well as solutions that only run on Windows and depend on Excel being available.
They have their pros and ther cons. This is a run down of the possible options.
XlsxWriter
The first option I tried was XlsxWriter. This solution is cross-platform, being compatible with Linux and Windows. It does not have many dependancies. It however is limited to writing only and does not support creating VBA macros in the created Excel files.
XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format.
XlsxWriter can be used to write text, numbers, formulas and hyperlinks to multiple worksheets and it supports features such as formatting and many more, including:
- 100% compatible Excel XLSX files.
- Full formatting.
- Merged cells.
- Defined names.
- Charts.
- Autofilters.
- Data validation and drop down lists.
- Conditional formatting.
- Worksheet PNG/JPEG/GIF/BMP/WMF/EMF images.
- Rich multi-format strings.
- Cell comments.
- Integration with Pandas and Polars.
- Textboxes.
- Support for adding Macros.
- Memory optimization mode for writing large files.
It supports Python 3.4+ and PyPy3 and uses standard libraries only.
It is ideal for generating reports and has a very pythonic API.
openpyxl
This is the second option. The main advantage over XlsxWriter is that allows reading existing xlsx files as well as writing. Like XlsxWriter it is a portable library, able to run on Windows and Linux. It does not have any strange dependancies. The main disadvantage I found is that it is a bit temperamental, and the order of how you do things can impact the output of the generated file. The API specifically is a bit wonky with things being non-intuitive. Also, like XlsxWriter, openpyxl does not support reading or writing VBA macros.
openpyxl is a Python library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
It was born from lack of existing library to read/write natively from Python the Office Open XML format.
It was initially based on the PHPExcel library.
If you have to read and write Excel files on Linux as well as Windows, this seems to be the way to go. However, be prepared for a steep learning curve and do a lot of trial and error to get things working right.
xlwings
The third option is xlwings. This is a freemium library that has multiple versions with different capabilities. These are:
- xlwings open source
xlwings (Open Source) is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa:- Scripting: Automate/interact with Excel from Python using a syntax close to VBA.
- Macros: Replace VBA macros with clean and powerful Python code.
- UDFs: Write User Defined Functions (UDFs) in Python (Windows only)
- xlwings pro
xlwings pro adds these features:- One-click Installer: Easily build your own Python installer including all dependencies—your end users don’t need to know anything about Python.
- Embedded code: Store your Python source code directly in Excel for easy deployment.
- xlwings Reports: A template-based reporting mechanism, allowing business users to change the layout of the report without having to touch the Python code.
- Markdown Formatting: Support for Markdown formatting of text in cells and shapes like e.g., text boxes.
- Permissioning of Code Execution: Control which users can run which Python modules via xlwings.
- Table.update(): An easy way to keep an Excel table in sync with a pandas DataFrame
- xlwings lite
xlwings Lite brings the VBA experience into the modern age by offering a privacy-first, secure, and developer-friendly way to automate Excel and write custom functions with Python. - xlwings server
xlwings Server adds Python support to Microsoft Excel and Google Sheets without the need of a local Python installation. xlwings Server is self-hosted and runs on any platform that supports Python or Docker, including bare-metal servers, Linux-based VMs, Docker Compose, Kubernetes and serverless products like Azure functions or AWS Lambda.
For the simple use case of reading and writing Excel file data it is a bit of an overkill. It does support adding VBA code to Excel files, however there are limitations associated with the VBA object model. So while code can be added, classes and expred modules can not.
pywin32
You can always use the Python for Win32 with COM support. This gives you access to the Component Object Model (COM), which is the native automation functionality of Windows which Excel supports.
This is a quick example:
import win32com.client
import sys, io
# Open up Excel and make it visible (actually you don't need to make it visible)
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = True
# Redirect the stdout to a file
orig_stdout = sys.stdout
bk = io.open("Answers_Report.txt", mode="w", encoding="utf-8")
sys.stdout = bk
# Select a file and open it
file = "path_of_file"
wb_data = excel.Workbooks.Open(file)
# Get the answers to the Q1A and write them into the summary file
mission=wb_data.Worksheets("1ayb_MisiónyVisiónFutura").Range("C6")
vision =wb_data.Worksheets("1ayb_MisiónyVisiónFutura").Range("C7")
print("Question 1A")
print("Mission:",mission)
print("Vision:" ,vision)
print()
Conclusions
For strictly writing Excel files, XlsxWriter is more than enough. It is easier to use and have a nice Pythonic API.
For reading and writing Excel files, openpyxl is enough, despite its flaws.
If you need limited support for VBA code, xlwings is the best option.
Hints and Tips
OpenPyxl
openpyxl has a lot of quirks. Here is a number of things I discovered:
Data Validation
This is documented in openpyxl Data Validation and data validation module.
Data validators can be applied to ranges of cells but are not enforced or evaluated. Ranges do not have to be contiguous: eg. “A1 B2:B5” is contains A1 and the cells B2 to B5 but not A2 or B2.
Example:
# Create a data-validation object with list validation
dv = DataValidation(type="list", formula1='"Dog","Cat","Bat"', allow_blank=True)
The formula1
is a formula that can contain a list. The list is comma separated.
You can use double quotes if you need to escape values with commas. Double quotes can
be escaped by entering two double quotes. (i.e. in general, it folows Excel formula
syntax).
The other strange thing is that if you want to show a drop down box with the list options
you can add the keyword argument showDropDown
. If you want to see the dropdown, set
this to False. If you want to hide it, set it to True.
Creating groups
For creating groups you can use
worksheet.column_dimensions.group(start, end, **kwargs)
worksheet.row_dimensions.group(start, end, **kwargs)
Keyword args:
- hidden : hide group
- outline_level : for multilevel groupings
There a further restrictions with creating groups, in that:
- you must create the cells in the groups before grouping
- if you are using multi-level groups, you must create the outer-group first before creating the inter groups.
xlwings
Some tips when using xlwings:
Steps to Enable Programmatic Access to VBA
- Open Excel:
- Launch Excel (you don't need to open a specific workbook).
- Access Options:
- Click on
File
in the top menu. - Select
Options
from the menu to open the Excel Options dialog.Options
item should be at the bottom of the menu.
- Click on
- Trust Center:
- In the Excel Options dialog, click on
Trust Center
in the left pane. - Click on
Trust Center Settings...
to open the Trust Center settings.
- In the Excel Options dialog, click on
- Macro Settings:
- In the Trust Center, click on
Macro Settings
. - Check the option labeled "Trust access to the VBA project object model".
- In the Trust Center, click on
- Save Settings:
- Click
OK
to save the Trust Center settings. - Click
OK
again to exit the Excel Options dialog.
- Click
Additional Considerations
- Security Implications: Enabling programmatic access to the VBA project object model can expose your system to potential security risks, especially if you run untrusted code. Ensure that the code you execute is from a trusted source.
- IT Policies: If you are using Excel in a corporate environment, these settings might be managed by IT policies, and you may need to contact your IT department for assistance.