In my current research project, I am working with two well log datasets from Equinor. The first is a large dataset that they released to CIUS, my research group. The second is a smaller freely available dataset called Volve Data Village. The files in those datasets contain measurements from many of Equinor’s subsea wells on the Norwegian continental shelf. These data files are primarily in the DLIS format, formally known as API RP66.
Even though DLIS is the most common format for well log data today, only a very limited number of programs can read it. In addition, most of these programs are geared towards displaying the data so that log interpreters can analyse it visually. What I need, on the other hand, is full access to the data so that I can run my own computational analyses.
When I started my post-doc around a year ago, I had to figure out how to get the data out of DLIS files so that I could work with it. Since then, I have learned quite a bit about how to read these files. In this post, I want to share some of what I have learned with you.
What are DLIS files?
In 1991, the American Petroleum Institute introduced the Digital Log Interchange Standard (DLIS) format as their Recommended Practice 66 (API RP66). This original version of the standard has now become the standard format for well log data. (The second version of the DLIS standard from 1996 never really caught on.)
Sets, frames, and channels
We can divide the content of DLIS files into two main components, namely sets and frames.
A DLIS set is basically a table of data that describes the logging situation. For example, one set might specify the tools used in the logging run. Another set might specify which frames are available in the file. Other sets may specify the parameters used while logging and processing the data.
However, sets do not contain the data measured throughout the well. Instead, DLIS channels hold this data as a function of depth. DLIS frames consist of DLIS channels with the same resolution (and the same depth axis, which itself is a channel in the frame). As shown to the right, DLIS channels may be one-dimensional (i.e. hold one value per depth), two-dimensional (e.g. values over multiple angles or multiple samples of a time signal per depth), or even higher-dimensional. Common depth resolutions for DLIS frames are 6 in, 3 in, 2 in, 1.5 in, and 1 in.
By the way, a .dlis file on your disk actually contains one or more “logical” DLIS files. Each of these has its own sets and channels. Thankfully, having more than one logical files in a disk file is the exception rather than the rule.
Problems with DLIS
There is no getting around the fact that DLIS is an old and awkward format from a very different time. As an example of how archaic it can be, the DLIS standard dedicates significant space and complexity to structures that allow splitting files across multiple magnetic tapes. It defines a frankly unneccessary amount of non-standard data types. Reading it gives the impression that efforts to save disk space have strongly increased its complexity. In fact, reading the standard at all is not an easy task as it fairly impenetrable.
Moreover, while DLIS is a standardised format, I have heard from several people who have looked into its guts that it is not always specified unambiguously. Consequently, different companies and softwares use their own dialects of DLIS to some degree. Therefore, their files may not be entirely compatible with one another. This makes the task of reading DLIS files harder, as you have to compensate for these kind of differences.
For these reasons and more, writing your own DLIS reader is an extremely challenging task. It’s not something that a lone researcher should take on on their own unless they are willing to dedicate a lot of time to it. To make their DLIS reader robust, they would also need to have example files from a wide range of DLIS file producers available.
Alternative well log formats
At the time that DLIS was originally published, other formats were already common. The Log Information Standard (LIS), published by Schlumberger in 1979, was the predecessor to DLIS. However, this format is supposedly even more difficult to deal with than DLIS, and thankfully you probably don’t have to unless you are trying to read really old well log files.
Another format is the Log ASCII Standard (LAS), published by the Canadian Well Logging Society around 1990. Unlike the binary DLIS and LIS formats, LAS files are pure text files. A LAS file typically starts with a block of well information and another block of log parameters. Then, it finishes with the channel data in an easily parsed CSV-like format.
It is both an advantage and a disadvantage that LAS files are text-based. The advantage is that humans can easily read LAS files, and writing a parser for them is relatively simple. Text-based formats have some disadvantages compared to binary formats, however. Representing numbers by text strings takes a lot more disk space, can reduce the precision of the data, and makes files slower to read.
An example of a newer well log format is the JSON Well Log Format that Petroware published in 2019. It stores log data in the text-based JSON format, for which most widely-used programming languages have libraries. Despite such efforts, however, we will still have to deal with DLIS files for the time being.
My goal for reading DLIS files was to get the data into another format that I could read quickly from Python for the purposes of processing and/or machine learning. I had a few criteria that a good approach should meet:
- Completeness: I wanted to be able to read all the information available in a DLIS file; I would not be able to know what data would be useful before examining it further.
- Precision: I did not want to lose information by reducing the precision of floating-point numbers, as you typically do while converting numbers from binary to text.
- Speed: As data reading speed is essential for machine learning, I wanted to be able to read any desired information into Python very quickly.
- Automation: There are many non-scriptable GUI programs that can export data. However, this is extremely time-consuming when you are trying to export from hundreds of files. I therefore wanted an approach that I could automate, e.g. through scripting.
- Price: Ideally, I wanted to use free software or libraries, as I have a quite limited research budget.
DLIS tools and approaches
I have gathered some information about possible tools and approaches for reading DLIS files, and I’ll present the most interesting ones to you. I’ll skip most of the commercial tools, as I have not been able to test most of them, and they obviously fail the price criterion. Additionally, people have told me that many of them fail the automation and precision criteria as well. You don’t want to have to pay money for a program where you still have to click through a GUI for every DLIS file you want to export data from.
Log Data Toolbox
This is a free toolbox from Schlumberger, who published its latest version, 2.3, in 2009. The toolbox contains a variety programs to work with well log data. While this is Windows-only software, I can also run it fine on macOS using Wine. The most interesting programs are:
While this program cannot actually export channel data from DLIS files, it is still very useful. It can generate great overviews of the contents of the files. These include all of the log parameters stored in a file, and a list of frames and the channels that the file contains. These channels are organised by the tools that they belong to. You can even export the overview as HTML.
Moreover, this program has a command-line interface, which means that you can script it. For example, you can script the program to export a nice HTML summary of every DLIS file that you have.
DLIS to ASCII
This program lets you convert DLIS files to a number of different text-based formats. These include a variety of LAS standards, in addition to several CSV-like formats. In the program, you can select which channels and parameters you want to export. In fact, you have to; if you wanted to export all channels and parameters in the GUI, you would have to laboriously click hundreds of boxes for each file you want to get data from.
Thankfully, the program also has a command-line interface so that you can script it. Even so, you would have to specify for each file which channels and parameters you want to export. This requires that you know the channels and parameters available in each file ahead of time. While this information is available in the HTML files that DLIS InfoView can provide, parsing these files and scripting everything is a fairly complex task. Additionally, the available export formats do not support different channels having multiple depth resolutions. It is possible to specify any depth resolution for export, and the program will downsample higher-resolution channels and interpolate lower-resolution channels. However, this will lose information from the higher-resolution channels, and unnecessarily bloat the file with redundant information from lower-resolution channels. Alternatively, you could export one file for each depth resolution, but this adds to the complexity.
Technically, you could use the Log Data Toolbox to build a data export approach that fulfills the criteria above. Even precision can be fulfilled, as DLIS to ASCII lets you specify how many decimals you want in your output. However, using these tools to export data would be a complex and annoying scripting job. You would also end up with bloated text files that you would have to convert to another format to be able to read the data quickly.
Teradata recently published this Python library under the free BSD-3-clause licence. Its main functionality is to read DLIS files and dump their contents to text files. dlispy dumps the DLIS sets as a JSON file, and each DLIS as a CSV file. From what I can tell, the dump is fairly complete.
Reading the CSV files is fairly straightforward. Each line represents one depth, and contains the values of all the frame’s channels for that depth. However, it gets a bit trickier for multidimensional channels. Instead of just having one value per line, they have a string containing a list of values. Additionally, channels with more than two dimensions (which is admittedly rare) are flattened to two. You have to look them up in the DLIS sets to be able to unflatten them. And understanding the dumped DLIS sets requires some work.
Even though its documentation is not good and using it is not entirely straightforward, you can certainly use dlispy to extract data from DLIS files. It fulfills my criteria: It is free, can be automated, and seems to extract all the data. While it does export the data to text files, the decimal precision seems to be high, and you can subsequently convert these files to other formats that are quicker to read. I have not tried it extensively, however, and I do not know if it is able to read a wide variety of DLIS files.
I should also mention that its developers seem to have abandoned dlispy. At least at this time of writing, they only made two commits to github in November 2018, and there has been no activity since.
This is another Python library that Equinor recently published under the free LGPL license. While the library is currently under heavy development and only has API documentation, it is already usable. You can use it directly in Python to explore the contents of DLIS files. While reading data directly from the DLIS files is somewhat slow due to the design of the DLIS standard, you can fairly easily export the data to another, faster format. (One idea is exporting to HDF5 using the h5py Python library.)
Some things are still missing, though. For example, it would be very useful to have a link from a channel or frame to its corresponding depth axis. It is still possible to find, but more of a hassle than necessary. [Update: This feature was added to dlisio in October 2019.] And while dlisio can give you the units of DLIS channels, it does not yet provide the units of measurement parameters. Thus, you cannot tell if a parameter value is in seconds, milliseconds, microseconds, or something else.
Even though it may still lack some features, dlisio is a good bet if you want to extract data from DLIS files. Again, it is free, easily automated, and looks like it can extract all the data from a DLIS file. Unlike dlispy, dlisio lets you access the data directly without having to export text files. Another advantage of dlisio is that its developers are currently very active, and they are always improving the library.
This is a commercial library for Java and .NET, published by Petroware. They first made it public in 2013, and they have been improving it since. Petroware’s parent company Logtek performs QA on most well logs from the Norwegian continental shelf using software based on this library. For that reason, you can expect Log I/O to be able to read pretty much any DLIS file that you can throw at it. Log I/O also has good documentation, both for Java and .NET, and I found it quite easy to use. The only problem I had with it was when reading a few very large files, the largest being over 6 GB. (I had the same problem on macOS and Linux, and it could not be solved simply by increasing the Java heap size.) Still, I found a workaround by using a Log I/O feature to slowly read the files bit by bit.
If you don’t mind using Java or .NET, Log I/O fulfills all of my criteria except one: It is not free, and a licence costs a few thousand EUR or USD. The exact amount will depend on the license type and the features you want. If you are considering using it, though, you may be able to get a time-limited free trial version if you contact Petroware.
So, what to do?
When I needed to extract data from DLIS files, neither dlispy or dlisio had been published. I therefore ended up buying a one-year licence to Log I/O. I used its Java version to read the contents of DLIS files, and used the HDF Object Package for Java to write to a HDF5 format. The latter library is part of the HDFView software, but you can also use it as a separate library. Using some Python helper modules that I wrote, I can then read these HDF5 files easily and efficiently into Python. In other words, an approach based on Log I/O has worked quite well for me.
But what should you do, now that some more options are available? Well, it depends on a few things, namely price and your DLIS files. If you want to go for a free alternative and don’t mind using Python, look into dlisio and/or dlispy and see if you are able to read your DLIS files without problems. If you have money to spend and don’t mind using Java or .NET, Log I/O is a safe and good option.
Regardless of what you choose, you probably also want to export the DLIS contents to a format that you can read efficiently. A simple option could be a structure with folders containing binary files. Each frame is a folder, and its channels are binary files. However, it’s not quite that easy. You also need to store meta-data, such as each channel’s units or numeric data type. For that reason, I think the HDF5 format is a good option. It supports a nice folder-like structure for your data, and DLIS channels of any number of dimensions can be nicely represented as datasets in HDF5. These datasets are essentially n-dimensional arrays. HDF5 also lets you annotate each folder and dataset with metadata if you like. Not only that, but pretty much any programming language has libraries for HDF5.
But regardless of how you end up getting at the sweet, juicy data inside your DLIS files, I wish you success!
All log data and log information shown in this post is taken from Equinor’s Volve Data Village dataset. This dataset is available under the CC BY-NC-SA 4.0 licence.