Repeat text extraction with Python



I have the following code which I would like to use to extract texts information between <font color='#FF0000'> and </font>. It works fine but it only extracts one unit (the first one) whereas I would like to extract all textual units between these tags. I tried to do this with a bash loop code but it didn't work.



import os

directory_path ='C:\\My_folder\\tmp'

for files in os.listdir(directory_path):

print(files)

path_for_files = os.path.join(directory_path, files)

text = open(path_for_files, mode='r', encoding='utf-8').read()

starting_tag = '<font color='
ending_tag = '</font>'

ground = text[text.find(starting_tag):text.find(ending_tag)]

results_dir = 'C:\\My_folder\\tmp'
results_file = files[:-4] + 'txt'

path_for_files = os.path.join(results_dir, results_file)

open(path_for_files, mode='w', encoding='UTF-8').write(result)

No comments:

Post a Comment