Extract MSI Files on Linux and Windows: Research & Solution

Introduction

This article explores various methods to extract MSI files on both Linux and Windows, focusing primarily on automating the process in a command-line environment on Ubuntu Linux. I aim to find an effective solution for unpacking MSI files without user intervention. While existing tools offer some functionality, they often fall short on Linux. After evaluating several tools that work well on Windows but not Linux, I developed a solution leveraging the Python ctypes module over Windows ‘msi.dll’ that operates seamlessly on both platforms, including Linux through Wine. If you’re eager to see our solution, scroll directly to it.

Affiliate: Experience limitless no-code automation, streamline your workflows, and effortlessly transfer data between apps with Make.com.

What is MSI?

Microsoft Installer (MSI) files are a widely used format for software installation packages on Windows. Understanding the nature and function of MSI files is crucial for anyone looking to manage software installations efficiently.

Definition and Purpose of MSI

MSI files, or Microsoft Installer files, are the primary format for Windows installation packages. These files are designed to install, maintain, and remove software on Windows systems. The purpose of MSI files is to streamline and standardize the installation process, ensuring that applications are installed correctly and consistently across different systems. Developers can automate the installation process using MSI files and improve the user experience.

History and Development of MSI Files

Before MSI, software installations often involved custom scripts and proprietary installers, which could be unreliable and difficult to manage. The introduction of the Microsoft Installer service with Windows 2000 marked a significant step forward in software installation technology. MSI files standardized the installation process, making it easier for developers to create robust installers and for users to manage their software.

Everyday Use Cases for MSI Files

MSI files are commonly used for various purposes in the Windows ecosystem. One primary use case is deploying enterprise software, where MSI files facilitate mass installations across numerous machines. They are also used for updating and patching existing software, ensuring that updates are applied uniformly. Additionally, MSI files are valuable for administrators who need to manage software installations through group policies, as they support features like silent installation and rollback, which are essential for maintaining system stability.

Structure of MSI Files (OLE Streams, SQL Tables)

The structure of MSI files is based on a combination of OLE (Object Linking and Embedding) streams and SQL tables. This hybrid structure allows MSI files to store complex installation data in a highly organized manner. OLE streams provide a way to store various data types, such as binary files and scripts, within the MSI file. SQL tables, on the other hand, are used to manage the relationships and dependencies between different installation components. This structured approach ensures that all necessary information for the installation process is contained within a single file. Unfortunately, the structure of MSI files makes the extraction process complex, especially without installation or the option to use msiexec’s administrative install.

MSI File Extraction

Extracting MSI files is an important task for many IT professionals and developers. Understanding the process is essential for efficient software deployment and management, especially when dealing with operating systems like Linux and Windows. This section explains the importance of extracting MSI files, everyday use cases, and the complexities involved, particularly on Linux systems.

Importance of Extracting MSI Files

Extracting MSI files allows users to access the contents of an installation package without performing the complete installation. This capability is vital for software auditing, troubleshooting installation issues, and repackaging software for deployment in specific environments. Administrators can inspect files, scripts, and other components by extracting MSI files, ensuring the software complies with security and operational standards.

From an information security standpoint, extracting MSI files can help ensure software installations’ safety by verifying that it is trusted software. Security professionals can inspect MSI files for malicious code, unauthorized changes, or vulnerabilities before deployment by unpacking MSI files. Additionally, extracting MSI files allows for thorough audits and compliance checks, reinforcing an organization’s security posture. Understanding the structure and contents of MSI files is a critical step against potential threats, making the extraction process an essential practice in information security.

The Complexity of Extraction

The structure of MSI files makes the extraction process complex. This complexity arises from the organization of OLE streams and SQL tables that store the installation data. Without installation or the option to use msiexec’s administrative install, extracting MSI files requires a deep understanding of these structures and the data they contain. The need to parse and interpret SQL tables, which hold essential information about the components and their relationships, adds to the task’s difficulty.

Even if you extract the OLE streams as files (7-zip), including the CAB files containing the application installation files, the process still presents challenges. Extracting these CAB files reveals contents without their original names, as most files have numeric and alphabetic strings correlated to entries in the SQL tables that rename them appropriately. Simple CAB extraction does not cover this renaming process, making the extracted files initially useless. Understanding this complexity is crucial for effectively unpacking MSI files, as it requires more than just extracting CAB files to ensure that all components are correctly identified and usable.

The Complexity of MSI Extraction on Linux

Extracting MSI files on Linux introduces additional complexities. Many tools and methods designed for Windows environments do not function properly on Linux. The lack of native support for MSI files on Linux means that users often rely on emulation layers like Wine or develop custom scripts to handle the extraction. Our solution leverages Python’s ctypes module that uses Windows and Linux Wine built-in “msi.dll” (the core of the Windows Installer) to overcome these challenges, providing a reliable method to extract MSI files on both Linux and Windows.

Understanding msi.dll and Its Role to Extract MSI

Msi.dll is a crucial Windows library that provides the core functionalities for interacting with MSI files. This dynamic link library is part of the Windows Installer service. It offers the necessary APIs for creating, modifying, and extracting MSI packages. The msi.dll is needed for MSI extraction because it facilitates the low-level operations required to access and manipulate the contents of MSI files, including reading database tables, handling file streams, and executing custom actions. Without msi.dll, tools and scripts would have to implement their own interface to operate the MSI files. This library’s capabilities highlight the complexity of extracting MSI files on Linux, where equivalent support may not be readily available.

Tools Available for Extracting MSI Files

Various tools are available for extracting MSI files, each offering unique features and capabilities. These tools facilitate the extraction process on both Windows and Linux systems. This section explores the tools I tested before implementing my solution and discusses their features, usage, and limitations.

MSIEXEC

Overview and Features

MSIEXEC is an executable (included with Windows and runs as a service) that can be used as a command-line utility to install and perform administrative tasks on MSI files. In addition, it can manage post-installation tasks, like uninstall. It supports various command-line options that enable users to extract files, apply patches, and generate log files for troubleshooting.

How to Use MSIEXEC for Extraction

Using MSIEXEC to extract MSI files is straightforward and involves executing specific command-line instructions. To extract the MSI file, you will use the “administrative install” feature with the following command:

msiexec /a path\to\file.msi /qb TARGETDIR=path\to\extract\to
msiexec /a path\to\file.msi /qn TARGETDIR=path\to\extract\to

Note: “/qb” will show an extraction progress bar, while “/qn” will not show anything.

An example to unpack “C:\msi\setup.msi” file to “C:\msi\unpacked”:

msiexec /a "C:\msi\setup.msi" /qb TARGETDIR="C:\msi\unpacked"

This command initiates an administrative installation, extracting all files to the specified target directory. The /a switch indicates administrative mode, while /qb specifies a “basic user interface” during the process. This method is handy for unpacking MSI files without performing a complete installation.

Limitations and Considerations of Using MSIEXEC

Despite its powerful capabilities, MSIEXEC has some limitations that must be considered. Firstly, it is a Windows-specific tool, which means it cannot be used natively on Linux systems without an emulation layer like Wine. Additionally, the administrative installation mode may not extract specific custom actions or embedded scripts executed during a complete installation. Users must also ensure they have administrative privileges to run MSIEXEC commands effectively. Understanding these limitations is crucial for effectively using MSIEXEC as part of your MSI extraction toolkit.

Wine in Linux does have msiexec.exe by default. The problem is that the specific msiexec administrative install feature is not supported even on the latest developer version of Wine (currently 9.9). You can try performing a regular installation with “/I” instead of “/a,” but there are MSI files with particular prerequisites that need to be installed, so the installation will fail, making this method useless for automation in production.

You can try copying msiexec.exe from a working Windows installation. However, you must understand which additional files (like “msi.dll”) you must copy and probably registry settings.

7-Zip

Overview and Features

7-Zip is a popular open-source file archiver known for its high compression ratio and extensive support for various file formats, including MSI files (extracting only). 7-Zip offers a command-line interface suitable for automated extraction tasks on Windows and Linux. Its ability to handle various compression formats and encryption methods makes 7-Zip a versatile tool for managing archives. Additionally, 7-Zip’s lightweight nature and powerful performance make it a preferred choice for many users who need to extract MSI files and other archives efficiently.

How to Use 7-Zip for Extraction

Download a 7-Zip installer suitable for your needs. You can download the MSI version if you want to install it silently with:

msiexec /I 7z-x64.msi /qn

In addition, you can download the EXE version, install it, or extract it with another tool (like WinRAR) and use it as a portable version.

Using 7-Zip to extract MSI files involves simple command-line instructions. To extract the MSI file, you can use the following command:

7z x path\to\file.msi -opath\to\extract\to

Notice that there is no space between the “-o” switch and the output path. An example to unpack “C:\msi\setup.msi” file to “C:\msi\unpacked”:

7z x "C:\msi\setup.msi" -o"C:\msi\unpacked"

This command directs 7-Zip to extract all files from the specified MSI file to the target directory. The x switch indicates extraction, while the -o switch specifies the output directory. This process is straightforward and can be easily incorporated into scripts for automated tasks. 7-Zip’s support for various platforms, including Linux, makes it a practical tool for extracting MSI files across different operating systems.

Limitations and Considerations of Using 7-Zip

Despite its strengths, 7-Zip has limitations when extracting MSI files. One major limitation is that 7-Zip extracts the OLE streams and CAB files containing the application installation files. After the first MSI extraction, you must also extract the CAB files. However, these CAB files often have numeric and alphabetic strings instead of their original names, correlated to entries in the SQL tables for renaming them to their appropriate names. Simple CAB extraction with 7-Zip does not cover this renaming process, making the extracted files unusable. Additionally, while 7-Zip is powerful, it may not fully support all MSI-specific features and nuances, necessitating additional steps or tools to complete the extraction process accurately.

LessMsi

Overview and Features

LessMsi is a lightweight, open-source tool designed specifically for extracting the contents of MSI files. It provides a straightforward interface for browsing and extracting files from MSI packages without performing a complete installation. LessMsi supports command-line and graphical user interface (GUI) modes, making it versatile for different user preferences. Its primary feature is the ability to quickly and efficiently extract all files contained within an MSI package, making it an invaluable tool for developers and system administrators who need to inspect or repurpose the contents of MSI files.

How to Use LessMsi for Extraction

There is a lessmsi official website and a lessmsi GitHub page. There are many more sources, but these are the most common ones. Download the latest “Zip of lessmsi application binaries” version from the lessmsi GutHub releases page.

Using LessMsi to extract MSI files is simple and user-friendly. You can execute “lessmsi.exe” to run the GUI. To extract the MSI file via the command line, use the following:

lessmsi x path\to\file.msi path\to\extract\to\

An example to unpack “C:\msi\setup.msi” file to “C:\msi\unpacked”:

lessmsi x "C:\msi\setup.msi" "C:\msi\unpacked\"

Note: The target path to extract the MSI must have a trailing backslash (“\”) at the end. If not, the path will be treated as a specific file to extract from the MSI.

This command instructs LessMsi to extract all files from the specified MSI file and send them to the designated target directory. The x switch indicates extraction.

Limitations and Considerations of Using LessMsi

Despite its ease of use, LessMsi has some limitations. One significant limitation is that LessMsi is primarily designed for Windows, with no native support for Linux. While it can be run on Linux using Wine, this adds an extra layer of potential issues.

Additionally, we tried the five latest versions of LessMsi, starting from 2.0.1 and going down, as the developer dropped support for .NET Framework lower than 4.8 after version 2.0.0. None of these versions could extract a specific type of MSI file I tested, possibly due to the file’s 1.5 GB size. Several issues on LessMsi’s GitHub page suggest problems, though no direct evidence links them to my case. While LessMsi worked well on smaller files, I couldn’t rely on it for production use, as most of my MSI files are at least 1 GB. During the tests, there was no output in the console, and no files were extracted. If LessMsi didn’t work on Windows, there was no point in testing it on Linux.

jsMSIx (ActiveX DLL, VBScript, EXE)

Overview and Features

jsMSIx is a comprehensive toolset designed for extracting MSI files, available in several primary forms: an executable (EXE), an ActiveX DLL, and a VBScript. There are several other variations, but they’re not part of this test since I’m only interested in command-line compatibility. Users can choose the most suitable method through a standalone application, integration with other software via ActiveX, or automation with VBScript. jsMSIx.exe also provides a user-friendly interface.

How to Use jsMSIx for Extraction

You can check all the variations on the jsMSIx website. This is not an open-source project, so you will not get previous versions as you would on GitHub.

jsMSIx.exe

You can navigate to jsMSIx.exe directly.

After downloading the tool, users can run the executable to start the GUI and select and extract the contents of an MSI file. If you want to use the command line, execute this command:

jsMSIx.exe <MSI file path>[|<extraction directory>]

An example to unpack “C:\msi\setup.msi” file to “C:\msi\unpacked”:

jsMSIx "C:\msi\setup.msi" "|C:\msi\unpacked"

Note: The target path to extract the MSI must have a pipe character (“|”) before the path, so the path and the pipe must be inside the double quotes. In this case, the cmd won’t interpret it as a pipe for cmd.

No place on the website states cmd usage. This was taken from the jsMSIx WineHQ test page.

You must install “Wine” and “winetricks” to run it on Ubuntu Linux. Install the vb6runtime with:

winetricks -q vb6run

If you have a “cabinet.dll” error, you can install it with:

winetricks -q cabinet

Then run the executable with:

wine jsMSIx.exe

The GUI should work. If not, copy from your working Windows installation the “msi.dll” and “cabinet.dll” from “System32” if you installed Wine as 64-bit or from “SysWOW” if you installed the 32-bit Wine.

jsMSIx.dll ActiveX

For those who need to integrate MSI extraction into other applications or scripts, the jsMSIx.dll ActiveX component provides a powerful solution. By registering the DLL, developers can call its methods to extract MSI files programmatically.

Download the jsMSIx.dll. Register it with:

regsvr32 /s jsMSIx.dll

You can extract the MSI files using the VBScript supplied with the downloaded archive. From the “jsMSI sample scripts” folder, find the “Drop File to Unpack MSI.vbs.” To extract, run from cmd:

cscript "Drop File to Unpack MSI.vbs" "C:\msi\setup.msi"

The file will be extracted to the same directory as ‘Drop File to Unpack MSI.vbs’ resides.
If it is not working under 64-bit Wine on Ubuntu Linux, install it in a 32-bit Wine environment.
In addition, you will need to install the vbscript framework with:

winetricks -q wsh57 vcrun6

Now, register the DLL and run the command:

wine regsvr32 /s jsMSIx.dll

wine cscript "Drop File to Unpack MSI.vbs" setup.msi

jsMSIx VBScript

The jsMSIx VBScript offers another automation option that wraps the Windows built-in “msi.dll” functionality. By including the appropriate references and using VBScript to call the DLL’s methods, users can automate the extraction process in environments where GUI tools or direct application integration are not feasible. This method is particularly useful for batch processing multiple MSI files or integrating extraction into existing automated systems.

Download the jsMSIx.vbs. Run it with the command line:

cscript MSIUnPack.vbs setup.msi

The contents of the MSI file will be extracted into the working directory.
The same goes for using this VBS file on Ubuntu as with jsMSIx.dll execution under “Drop File to Unpack MSI.vbs.”

Limitations and Considerations of Using jsMSIx

I’ve tested the latest version of jsMSIx.exe from the website, which is 1.9, and found that the command-line support is not working. After searching GitHub repositories, I found version 1.4, which had command-line support and worked fine on Windows but did not function under Ubuntu Wine, except for the GUI. I also tried jsMSIx.exe version 1.1, which was included with Universal Extractor 2 (more on that later). However, some files were not extracted from the MSI test file.

The jsMSIx.dll could not be registered on Windows, but it did register on Wine. However, the VBS provided with the DLL extracted the CAB contents without renaming the files to their appropriate names from the database, rendering it useless.

The VBS version worked well on Windows but did not work on Wine, as the complex VBScript could not be executed in the current state of Wine. Thus, none of these jsMSIx variations helped automate the process under Ubuntu Linux, highlighting the complexity of extracting MSI files on Linux.

WIX Toolset

Overview and Features

The WIX (Windows Installer XML) Toolset is an open-source project that provides a comprehensive suite of tools for creating, modifying, and extracting MSI files. It uses XML to define the contents and the structure of installation packages, allowing for precise control over the installation process. Due to its flexibility and robustness, developers widely use WIX to build complex installation packages. It supports various features, including creating custom actions, handling multiple installation scenarios, and detailed logging capabilities, making it an invaluable tool for managing MSI files.

How to Use WIX Toolset for Extraction

WIX 3

WIX 3 is the older version, but many MSI extraction tutorials include instructions specifically for this version since ‘dark.exe’ is a separate executable from this toolkit to extract MSI files. Dark, one of the primary tools for extraction, is a decompiler that converts MSI and MSM files into WIX source code.

Navigate to the WIX 3 GitHub releases page. Currently, version 3.14.1 is the latest version of major version 3.
Find the latest version section. Find the “Assets” section inside of it.
Download the binaries file: wix314-binaries.zip
Extract the archive and find the “dark.exe” file inside.

To extract an MSI file using WIX 3, you can use the following command:

dark.exe <MSI file> -x <Output Folder>

Working example:

dark.exe setup.msi -x D:\output

The above extracts the contents of the MSI file and sends it to the output directory.

WIX 5

WIX 5, currently the latest major version of the WIX Toolset, builds on the capabilities of WIX 3 with additional features and improvements. However, the usage and command structure has changed. You can visit the WIX website for the latest updates and see the WIX installation docs. You can also check the latest WIX version GitHub page.

As a prerequisite, you must download the latest version of .NET SDK and install it.
Open the command line and run this command to install the latest version of WiX globally (all users):

dotnet tool install --global wix

To install a specific version, check the command on Nuget for the latest WiX version.

To extract an MSI file with WIX 5, you can use this command:

wix msi decompile <MSI file> -x <Output Folder>

Working example:

wix msi decompile setup.msi -x D:\output

Limitations and Considerations of Using the WIX Toolset

There was no point in testing WIX on Linux since it didn’t perform as expected on Windows. WIX 3 and WIX 5 had the same output, meaning the MSI extraction mechanism wasn’t updated much between versions. They extracted these folders: File, Icon, ISSetupFile, and Binary (Only the first CAB was extracted). Finally, the execution got stuck after several error messages. WIX decompiler is good for the MSI files created with WIX, but not for any file, even though MSI should be a standard.

Universal Extractor 2

Overview and Features

Universal Extractor 2 is a comprehensive tool designed to extract files from virtually any archive or installer, including MSI files. It supports various formats and provides an intuitive user interface for easy operation. The tool automates the extraction process by identifying the file type and applying the appropriate extraction tool, making it a valuable resource for users needing to handle various archives.

How to Use Universal Extractor for Extraction

Using Universal Extractor 2 to extract MSI files is straightforward.

Download the latest release from the Universal Extractor 2 GitHub page.

Extract the archive and execute ‘UniExtract.exe.’
The first execution will allow you to set up preferences and add Universal Extractor to the context menu (right-click menu).

After installing the tool, users can right-click on the MSI file and select the Universal Extractor option from the context menu. The tool automatically identifies the file type and offers several MSI extracting options. The user must manually select which extraction tool to use, and then the extraction process will begin. Alternatively, users can launch Universal Extractor, browse for the MSI file, and initiate extraction through the application’s interface.

Limitations and Considerations of Using Universal Extractor

One significant limitation is the lack of a command-line option for MSI extraction, which limits its utility for automated or batch-processing tasks. Although this aspect was not tested as thoroughly as other tools, relying on a GUI-based tool like Universal Extractor 2 can be overkill, especially when used on Linux through Wine. Wine is still not in the state to run such complex tasks.

MsiTools

Overview and Features

MsiTools is a suite of utilities designed to manipulate and extract information from MSI files. It includes various command-line tools that allow users to inspect, extract, and manipulate the contents of MSI packages. MsiTools is particularly useful for developers and system administrators who need detailed control over MSI files for deployment, troubleshooting, and customization tasks. The toolset is currently available only for Linux. You can visit the MsiTools GNOME page and MsiTools GitHub page.

The MsiTools are built on libmsi, Wine’s implementation of the Windows Installer. The suit is still in its early stages but probably will improve.

How to Use MsiTools for Extraction

Since we’re using Ubuntu, the easiest method to install msitools is using apt from terminal:

sudo apt-get update
sudo apt install msitools -y

Checking the version:

msiextract –version

If you’re on Ubuntu 22.04 LTS, then the version 0.101 will be installed this way. Currently, the latest version is 0.103. You can build this version straight from the repository. Create the sh bash file ‘msitools0103.sh’ with this content:

#!/bin/bash

# Update package lists.
sudo apt update
# Install dependencies.
sudo apt install -y gcc make meson ninja-build pkg-config libglib2.0-dev libgsf-1-dev gobject-introspection wget valac cmake libgirepository1.0-dev gtk-doc-tools

# Download and build libgcab from source
wget http://ftp.gnome.org/pub/GNOME/sources/gcab/1.4/gcab-1.4.tar.xz
tar -xf gcab-1.4.tar.xz
cd gcab-1.4
meson setup build
ninja -C build
sudo ninja -C build install
cd ..

# Download msitools 0.103 source code
wget http://ftp.gnome.org/pub/GNOME/sources/msitools/0.103/msitools-0.103.tar.xz

# Extract the downloaded tarball
tar -xf msitools-0.103.tar.xz
cd msitools-0.103

# Create a build directory and configure the build with Meson
meson setup build

# Compile the source code using Ninja
ninja -C build

# Install the compiled binaries
sudo ninja -C build install

# Cleanup
cd ..
rm -rf msitools-0.103 msitools-0.103.tar.xz gcab-1.4 gcab-1.4.tar.xz

echo "msitools 0.103 has been successfully installed."

Make this file executable:

chmod +x msitools0103.sh

And execute it:

./msitools0103.sh

Now, checking the version should show you the 0.103 version.

Using MsiTools to extract MSI files involves straightforward command-line instructions. The ‘msiextract’ tool is included in the suite to handle the extraction. The following command will extract MSI to the same directory:

msiextract /path/to/your_file.msi

Listing to the console all the files inside the MSI:

msiextract --list msi_file.msi

Extract the MSI file to a specific directory:

msiextract --directory "/path/to/extract/to" msi_file.msi

Limitations and Considerations of Using MsiTools

One significant limitation is that it currently supports Linux only. Since a built-in Windows msiexec can already use the command line for automation, I wanted to check how MsiTools works on Ubuntu. Another limitation is handling complex MSI files. It did nothing to extract my MSI test file, though the ‘list’ option did show the embedded files.

msidump (Python based)

Overview and Features

msidump is a Python-based tool designed to extract and analyze the contents of MSI files. Hosted on GitHub, msidump leverages Python for MSI extraction and security triage. It parses the MSI file structure and extracts the data, reviewing YARA rules to identify files inside for security considerations. Msidump is particularly useful for developers and security professionals who need to analyze MSI files in detail, offering a scriptable and extensible approach to MSI extraction.

How to Use msidump for Extraction

Using msidump to extract MSI files involves running a Python script from the command line.

Download and install Python, adding it to the PATH environment variable when prompted.

Download the msidump archive from GitHub.

Extract the archive, then navigate to that directory with cmd. Execute this command in cmd to install prerequisites:

pip install -r requirements.txt

Execute this command to extract MSI:

python msidump.py "path\to\file.msi" -O "path\to\directory\to\extract" --extract all

You can use the ‘-h’ parameter to understand the switches, but ‘—extract all’ should extract all possible files from the MSI.

Limitations and Considerations of Using msidump

I tested this tool on Windows, but unfortunately, it didn’t extract the test MSI file as expected (no CAB files were extracted), so there was no reason to test it on Linux. Since msidump uses Windows Installer COM object for interaction, we wanted to see how that works in Wine on Ubuntu. This resulted in numerous exceptions, rendering the usage of the COM object on Wine useless.

MsiAnalyzer

Overview, Features, and Usage

MsiAnalyzer is similar to the other tools, but written in C++, it is older than the rest (the last update was in 2020).

After downloading MsiAnalizer from GitHub and extracting and compiling the executable, the usage is simple:

MsiAnalyzer.exe <msi_file> <output_dir>

Limitations and Considerations

You get two folders: one for all the embedded files, including the CAB files, and the second for the MSI tables. In addition, you get ‘actions.txt’ with the action scripts. This is not enough. We need the contents of the CAB file extracted and renamed according to the file names in the MSI table.

I also compiled MsiAnalyzer on Ubuntu, and it did well. After execution, the slashes were in the Windows direction, but our problem was the CAB extraction and renaming.

Python’s Built-in msilib Module

Overview and Features

Python’s built-in msilib is a standard library module that provides functionalities for creating, reading, and modifying MSI files. It allows developers to interact with MSI packages programmatically, making it a versatile tool for automating MSI file extraction and manipulation tasks. Msilib offers a range of features, including inspecting the database tables within an MSI file, modifying existing records, and extracting embedded files with the help of external cab modules. Msilib is leveraging the ctypes module wrapping above Win API functionality of the ‘msi.dll’ file.

Limitations and Considerations

To my knowledge, no Python modules provide MSI integration functionality. Meaning msilib is the only maintained Python module available. The only problem is that the module has already been deprecated and will be removed in future Python versions.

Exploring MSI File Data Tables – Orca

Exploring the data tables within MSI files is crucial for understanding the installation process and modifying the package contents. Orca, a tool provided by Microsoft, is designed explicitly for this purpose. This section discusses what Orca is, how to install it, and how to use it to explore MSI files.

What is Orca

Orca is a database editor for Windows Installer packages (MSI) that allows users to view and edit the database tables that define the installation process. It provides a user-friendly interface to navigate various tables and entries within an MSI file. Orca is an essential tool for developers and system administrators who need to customize or troubleshoot MSI packages, offering detailed insights into the file’s structure and contents.

How to Install Orca

Installing Orca is straightforward but requires the Windows SDK. To install Orca, follow these steps:

  • Navigate to the Windows SDK Download page.
  • Click on [Download the installer >] to save the executable.
  • Run the installer.
    • On the “Specify Location” screen:
      (*) Install the Windows Software Development Kit
      [Next]
    • “Windows Kits Privacy” screen:
      Allow Microsoft to collect insights:
      (*) No
      [Next]
    • “License Agreement”:
      [Accept]
    • “Select the features you want to install”:
      <Deselect all the features>
      Select only [v] MSI Tools.
      [Install]
  • In explorer navigate to that folder: c:\Program Files (x86)\Windows Kits\10\bin\10.0.26100.0\x86\
    Note that ‘10.0.26100.0’ may be different when you install it.
  • Run ‘Orca-x86_en-us.msi’ to install Orca.
  • After installation, the Orca shortcut will be available from the Start Menu or in “C:\Program Files (x86)\Orca\Orca.exe” by default.

How to Explore MSI File with Orca

Using Orca to explore an MSI file is simple and effective. To get started, open Orca and follow these steps:

Open Orca from the Start menu.
Click on File and then Open. Browse to the MSI file you want to explore and open it.
Orca will display the database tables within the MSI file in the left pane. Clicking on any table will show its contents in the right pane.

Orca allows you to view and edit the contents of these tables, providing a detailed understanding of the MSI file’s structure. This tool is invaluable for examining custom actions, file sequences, and other critical components, making it easier to manage and customize MSI packages. Understanding these data tables is essential for anyone looking to extract MSI files or modify their contents, highlighting the complexity and detail involved in working with MSI files.

Extract MSI – Necessary Tables

Extracting MSI files effectively requires a thorough understanding of the various tables within the MSI database. These tables contain critical information about the files, locations, and the installation process. This section discusses the necessary tables for MSI file extraction, how to enumerate them, and how to correlate between these tables to extract the files successfully.

Enumerating the Tables

Enumerating the tables within an MSI file is the first step in understanding its structure and contents. Key tables include:

Binary: contains files responsible for the installation process. It has a Name row that lists all the file names and a Data row that contains binary data for each file in the Name row. This means you can save the files directly to the disk from this table.
Icon: contains icon files for the installation. It also has the Name and Data rows for each file to save the files directly to the disk.
ISSetupFile: contains more files responsible for the installation. It has three rows, but the important ones are the FileName, which includes file names, and the Stream row, which contains the binary data for each file so they can be extracted directly from this table.
Registry: contains information about the registry changes during the installation. It is not a file, but since we’re on it, I wanted to save it as a REG file for easier reference. The table has six rows, but we need only four: Root, Key, Name, and Value. The Root row contains an integer value that represents the primary registry key:
0: HKEY_CLASSES_ROOT
1: HKEY_CURRENT_USER
2: HKEY_LOCAL_MACHINE
3: HKEY_USERS
So, combining all the rows to a string will look like: Root\Key\Name = Value. If the Name is empty, it will be the “@” character: Root\Key\@ = Value.
Media: provides CAB file names only under the Cabinet row. Each CAB filename will begin with a “#” character.
_Streams: CAB files will be extracted from here by Name row, which will have the file names we got from the Media table, and the Data row will have the binary data that will help us extract the files.
File
: lists all the files in the CAB archives and the paths they will be converted to. The necessary rows are File, FileName, and Component_. The File is the original name of the file inside the CAB archive. The FileName is the name to which the filename from the File row will be converted. The Componnet_ has a numeric and alphabetical representation of the folder to which these files will be extracted.
Component: has information to help convert the Component_ field from the File table to a directory. The necessary fields are Component and Directory_.
Directory: specifies information to convert from the Component table to the physical directory on the disk. The necessary fields are Directory, Directory_Parent, and DefaultDir.

Using tools like Orca or scripts utilizing ‘msi.dll,’ you can enumerate these tables and gather the necessary details for extraction.

Correlating Between the Tables to Extract MSI Files

As you can see, the real problem is extracting the CAB archives, extracting files from the CAB archives, and then renaming them and sorting them into appropriate folders.

The File table entries must be linked to their corresponding components in the Component table, which are then mapped to the directory paths in the Directory table. Each file entry references a specific file inside the CAB archive for extraction. The Media table provides the names of the CAB files, which will be correlated against the _Streams table to get the CAB binaries. Understanding these relationships ensures the extracted files are correctly named and placed in their intended directories. This process highlights the complexity of extracting MSI files, as it involves parsing multiple tables and accurately reconstructing the file system.

Extract MSI Files – My Solution

With the complexity of extracting MSI files, especially across operating systems like Windows and Linux, I developed a custom solution using Python. This solution leverages Python’s ctypes library to interact with the msi.dll, providing a versatile, cross-platform (Wine) approach to MSI extraction. This section outlines the key components of the solution, including the reasons for choosing Python, an introduction to ctypes, and how I leverage msi.dll to create a powerful and flexible tool for extracting MSI files.

In the first stage, the script extracts the accessible files from the table (binaries with their relative file names) and CAB archives. In the second stage, the CABs are extracted with 7z. Lastly, the extracted files are renamed using information from the appropriate tables.

Implementing 7z to extract CABs was done to save time instead of developing a custom solution to utilize the ‘cabinet.dll’ file for the same functionality. 7z is a robust, portable solution to extract many formats and the CAB archives.

Introduction to Python Solution

My Python solution was developed to address the limitations of existing tools in extracting MSI files, mainly when dealing with complex and large files and cross-platform environments. Using Python, I created a script that can reliably extract MSI files on Windows and Linux, offering a consistent and automated approach to handle even the most complex MSI packages. This solution simplifies the extraction process and can be easily integrated into various workflows.

Why Python

I chose Python because its versatility, cross-platform capabilities, readability, and relative ease of use make it ideal for scripting complex tasks like MSI extraction. Additionally, Python’s broad adoption means that other developers can easily modify and extend our solution, ensuring it remains adaptable to future needs. The ability to install Python on Windows and Wine was a critical factor in its selection, allowing developers to create a solution that operates seamlessly across different environments.

Introduction to ctypes and msi.dll to Extract MSI

The ctypes is a foreign function library in Python that provides C-compatible data types and allows calling functions in DLLs or shared libraries. This library is crucial to the solution, enabling it to interact directly with the msi.dll, which is essential for extracting MSI files. Using ctypes, developers can call functions from msi.dll and manipulate MSI files at a low level, bypassing the need for external tools and providing greater control over the extraction process.

Wine’s implementation of msi.dll was good enough for this script to work as expected without building MSI implementation from scratch.

Benefits: Linux and Windows

The primary benefit of our Python solution is its ability to operate seamlessly on both Linux and Windows. Our solution eliminates the need for multiple tools or manual intervention, streamlining the process of extracting MSI files and ensuring that it can be done reliably, regardless of the underlying platform.

Extract MSI – Code Examples and Explanation

The MSI unpacker Python solution is available on GitHub, which is part of my Python module atomicshop on PyPi. This section provides a detailed code breakdown, focusing on key functions and modules. You can explore the full implementation on GitHub and integrate it into your workflows.

Detailed Code Breakdown

The solution’s core lies in the base.py module, which handles the essential functions for interacting with MSI files through ctypes and msi.dll. Below is an explanation of the critical functions within this module and how they are used and correlated:

create_open_db_handle: This function creates and opens a handle to the MSI database. It is the foundation for all subsequent operations, allowing the script to interact with the MSI file’s data tables.

create_open_execute_view_handle: Once the database handle is created, this function opens and executes a view on a specific table within the MSI file. This is crucial for retrieving records from the database, which are then used for file extraction or analysis.

create_fetch_record_from_view_handle: This function fetches individual records from the view created in the previous step. It iterates over the records in a table, enabling the script to process each piece of data within the MSI file systematically.

get_table_field_data_from_record: This function extracts specific field data from a record fetched from the MSI database by type (e.g., string, binary).

These functions work together to provide a comprehensive approach to MSI extraction, allowing for precise control over the process and ensuring that all relevant data is accurately retrieved and utilized.

Key Modules Used to Extract MSI

The solution is modular, with each component serving a specific purpose in the extraction process. Below are the key modules used in the project, along with a brief explanation of their roles:

extract_msi_main.py: This module is the main entry point for the MSI extraction process. It orchestrates the overall workflow, calling functions from other modules to handle the database interactions, file extraction, and any necessary data processing.

base.py: As previously discussed, this module contains the foundational functions for interacting with MSI files via ctypes. It provides the core utilities to open MSI databases, execute views, fetch records, and extract field data.

cabs.py: This module handles CAB files after their extraction from the MSI. CAB files contain the compressed installation files, and this module ensures they are correctly extracted and renamed, correlating the extracted contents with the MSI’s database entries.

tables.py: The tables.py module focuses on interacting with specific tables within the MSI database. It provides specialized functions for accessing and manipulating the data in these tables, ensuring that the extraction process accounts for all necessary components and relationships.

Using AtomicShop MSI Unpacker on Windows to Extract MSI

The AtomicShop MSI Unpacker is designed to work seamlessly on Windows. This section guides you through the installation and setup process, followed by a step-by-step usage guide to help you effectively utilize the solution on a Windows system.

Installation and Setup

To get started with AtomicShop MSI Unpacker on Windows, follow these installation and setup steps:

  1. Install Python: download the latest Python version. During installation, make sure to add Python to your PATH environment variable. The next step will be to install the atomicshop module. You can use my cmd batch script to install the latest Python version and atomicshop in one click from GitHub. After installing the script, you can skip the second step.
  2. Install the atomicshop module: Once Python is installed, open a command prompt and install the atomicshop module using pip:
    pip install --upgrade atomicshop
  3. Install 7z: AtomicShop MSI Unpacker utilizes 7z to extract files from CAB archives within MSI packages. Download and install 7-Zip from the official website. Ensure the 7z command is available in your PATH environment variable to be accessed from the command prompt as ‘7z’.

Step-by-Step Usage Guide

After completing the installation and setup, execute the Main Python File. To run the extraction process, open a command prompt and execute the main Python file using the following command:

python -m atomicshop.a_mains.msi_unpacker

Specify Command-Line Switches: The tool accepts several switches to customize the extraction process:

  • -m for the input MSI file. Specify the path to the MSI file you want to extract.
  • -o for the output folder. This is the directory to save the extracted files.
  • -s (optional) for the full file path to 7z.exe. If not specified, the tool assumes 7z is available in your PATH. This switch is helpful if 7-Zip is not installed in the default location or if you prefer to specify the path manually.

An example command might look like this:

python -m atomicshop.a_mains.msi_unpacker -m "C:\path\to\your.msi" -o "C:\path\to\output\folder" -s "C:\Program Files-Zipz.exe"

Using AtomicShop MSI Unpacker on Linux to Extract MSI

For Linux users, particularly those using Ubuntu, AtomicShop MSI Unpacker can be set up and executed through Wine, allowing you to run the necessary Windows components seamlessly. This section provides detailed instructions for installation, setup, and usage on Ubuntu, ensuring you can effectively leverage the solution on a Linux platform.

Installation and Setup

To set up the solution on Ubuntu, follow these steps:

1. Install the Latest Version of Wine: Wine enables the execution of Windows applications on Linux. Run the following script to install the latest stable version of Wine:

#!/bin/bash

# Update and upgrade the system
sudo apt update -y
sudo apt upgrade -y

# Enable 32-bit architecture
sudo dpkg --add-architecture i386

# Add WineHQ repository key
wget -nc https://dl.winehq.org/wine-builds/winehq.key
sudo apt-key add winehq.key

# Add WineHQ repository
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ focal main'

# Update package list
sudo apt update -y

# Install Wine stable
sudo apt install --install-recommends winehq-stable -y

# Verify Wine installation
wine --version

2. Install Python: Use the following script to install Python in the Wine environment. Adjust the version number as needed:

# Download python.
wget https://www.python.org/ftp/python/3.11.9/python-3.11.9-amd64.exe

# Silently Install in Wine. Use '/quiet' for no GUI.
wine python-3.11.9-amd64.exe /passive InstallAllUsers=1 PrependPath=1 TargetDir="C:\Python311" AssociateFiles=1 InstallLauncherAllUsers=1

# Check the Python version.
wine python –version

# Remove the file.
rm python-3.11.9-amd64.exe

3. Install the atomicshop module: After Python is installed, use pip within the Wine environment to install the atomicshop module:

wine pip install --upgrade atomicshop

4. Download and Extract 7z Portable: Since 7z is required for MSI extraction, download the portable version of 7-Zip, extract it, and ensure you can access the executable from your Wine environment.

Step-by-Step Usage Guide

After installation and setup, you can use AtomicShop MSI Unpacker on Ubuntu with Wine. Execute the Main Python File:

wine python -m atomicshop.a_mains.msi_unpacker

Specify Command-Line Switches: The tool accepts the same switches as on Windows, with one key difference being that you need to specify the full file path for the portable 7z that you extracted. An example command might look like this:

wine python -m atomicshop.a_mains.msi_unpacker -m "path/to/your.msi" -o "path/to/output/folder" -s "path/to/7z/7z.exe"

Extracted MSI Components

When you extract an MSI file using AtomicShop MSI Unpacker, the output folder is organized into specific directories that categorize the various components extracted from the MSI package. Understanding the structure of these directories is essential for effectively managing and utilizing the extracted files. This section explains the directories in the output folder and their purpose, helping you understand the extracted MSI components efficiently.

Directories in the Output Folder

Embedded_MSI_Files

This directory contains all the files embedded directly within the MSI package. These files are extracted from various tables within the MSI database and are crucial for the installation process.

Binary_Files: This folder contains binary data extracted from the MSI’s Binary table. These files are often used for custom actions or embedded executables necessary during installation.

CAB_Files: The CAB_Files directory includes the CAB archives extracted from the MSI. These archives contain the compressed installation files, which are later unpacked and placed into their respective directories.

Icon_files: This folder stores the icon files extracted from the MSI, typically found in the Icon table. The application or installer uses these icons for shortcuts and other visual elements.

ISSETUPFILES: Files related to the InstallShield setup process are placed here. These are essential for MSI packages created with InstallShield and may include additional installation scripts or resources.

OLE_Metadata: The OLE_Metadata directory holds metadata related to the MSI File. This can contain the software the MSI file was created with, the name of the company that created it, and more.

Registry_Changes: This folder includes registry modification files extracted from the MSI, which are intended to make necessary changes to the Windows registry during installation.

Table_Contents: This directory contains the raw data extracted from the MSI’s internal tables and saved as a CSV file for each table.

Extracted_MSI_Installation_Files

The Extracted_MSI_Installation_Files directory contains the files renamed and extracted from the CAB archive. These files represent the application or installation components as they appear on the target system. The extraction process correlates the file names with the entries in the MSI database, ensuring that they are correctly named and organized according to the original installation script.

Extract MSI: Conclusion

Extracting MSI files is a complex task that requires a deep understanding of the MSI file structure, the tools available, and the challenges associated with cross-platform environments like Windows and Linux. AtomicShop MSI Unpacker, leveraging ctypes and msi.dll, offers a powerful and flexible approach to overcome these challenges and efficiently extract MSI files on both operating systems.

Summary of Key Points

Throughout this guide, we explored the complexity of MSI file extraction, including the necessary tables and their correlations, the tools available for extraction, and the specific solution I developed using Python. The installation and usage processes were detailed for Windows and Linux, ensuring users could implement the solution regardless of their platform. Additionally, the structure of the extracted MSI components was examined, providing a clear understanding of how the files are organized and what each directory contains.

Final Thoughts on Extracting MSI Files

Extracting MSI files, especially across different operating systems, presents challenges because of the MSI format structure and the tools required. My solution addresses these challenges by providing a robust, cross-platform tool that simplifies extraction, ensuring accuracy and efficiency. Whether you are a developer, system administrator, or security professional, understanding how to extract and manage MSI files is essential, and my Python-based approach offers the versatility and reliability needed to handle this task effectively.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.