I tried to write a simple script to convert xlsx files to pdf files using existing liber office installation.
My OS is Linux Mint 20 Ulyana.
After executing the following script, the code just hangs. Of course, sequential approach works.
Can you please give me advice what could be wrong and how to debug this code. I suspect that somehow two processes try to convert the same file.
Code: Select all
#!/usr/local/bin/python3
import os
import subprocess
import multiprocessing as mp
import time
def convert_pdf_soffice(xlsx_file):
out_dir = './PdfDir/'
print('Started conversion of ', xlsx_file)
subprocess.run(['soffice', '--headless', '--convert-to', 'pdf', '--outdir', out_dir, xlsx_file])
print('Finished conversion of ', xlsx_file)
if __name__ == '__main__':
start_t = time.time()
input_directory = './XLSX/'
output_directory = './PdfDir/'
# Create folder if not exists
if not os.path.exists(output_directory):
os.makedirs(output_directory)
existing_pdf_files = [file for file in os.listdir(output_directory) if file.endswith('.pdf')]
# Replace extension was pdf now is xlsx
already_converted_xlsx = [file[:-4] + '.xlsx' for file in existing_pdf_files]
# List of all xlsx files
xlsx_file_list = [file for file in os.listdir(input_directory) if file.endswith('.xlsx')]
# List of xlsx files that actually needs to be converted to pdf
xls_files_to_be_converted = [os.path.join(input_directory, file) for file in xlsx_file_list if file not in already_converted_xlsx]
print('Length of the list is ', len(xls_files_to_be_converted) )
# Multiprocessing conversion
with mp.Pool(processes = mp.cpu_count()) as pool:
result = pool.map(convert_pdf_soffice, xls_files_to_be_converted)
#for file in xls_files_to_be_converted:
# convert_pdf_soffice(file)
end_t = time.time()
duration_t = end_t - start_t
print(f'Duration is {duration_t}')
Code: Select all
Length of the list is 9
Started conversion of ./XLSX/File6.xlsx
Started conversion of ./XLSX/File2.xlsx
Started conversion of ./XLSX/File1.xlsx
Started conversion of ./XLSX/File7.xlsx
Started conversion of ./XLSX/File3.xlsx
Started conversion of ./XLSX/File5.xlsx