エクセルファイルからのデータの抽出

Question

下記の異なるファイル名のエクセルファイルから必要な部分の情報を抜きだしたいです。数日前に異なるファイル名のファイルについて投稿しておりました。
実現したいこと・エクセルシート内の記載内容は同じです。過去の投稿で得たコードの修正点をご教示いただければと思います。よろしくお願い致します。

ファイル名①: 2017.6.6XXXX日報
ファイル名②: 2017.09.12XXXX日報
ファイル名③: 2017.06.06XXXX日報
ファイル名④: XXXX日報20140715

過去の投稿でのコード
---------------------------------------　　　
import glob
import locale
import os
import re
import time
from datetime import datetime

import openpyxl as xl

locale.setlocale(locale.LC_ALL, '')

# エクセルファイル一覧を取得する。
xlsx_files = os.path.join('日報', '*.xlsx')
xlsx_paths = [path for path in sorted(glob.glob(xlsx_files))]

out_wb = xl.Workbook()  # 新しい workbook を作成する。
out_ws = out_wb.active

row_offset = 1

# ヘッダー
headers = ['タイトル', '更新日時', 'B', 'C', 'D', 'E', 'F',
           'G', 'H', 'I', 'J', 'K', 'L']
for c, label in enumerate(headers, 1):
    out_ws.cell(row=row_offset, column=c).value = label
row_offset += 1

for path in xlsx_paths:
    print('reading... ', path)

    # 更新日時を取得
    datetime = datetime.fromtimestamp(os.path.getmtime(path))
    modified = '{0:%Y年%m月%d日 %H時%M分}'.format(datetime)
    # Excel ファイルを読み込む。
    wb = xl.load_workbook(path, data_only=True)
    ws = wb.active
    # ファイル名から日付部分を取得
    matches = re.search(r'日報(\d{1,2}).(\d{1,2})', os.path.basename(path))
    month, date = matches.groups()
    title = '{}/{}'.format(month, date)

    # 値をコピー B14:L28
    out_ws.cell(row=row_offset, column=1).value = title  # タイトル
    out_ws.cell(row=row_offset, column=2).value = modified  # 修正日時
    for r, rows in enumerate(ws['B14:L28']):  # B14 ~ L28 の値
        for c, cell in enumerate(rows):
            out_ws.cell(row=row_offset + r, column=3 + c).value  = cell.value
    row_offset += 15

    # 値をコピー B34:L48
    out_ws.cell(row=row_offset, column=1).value = title  # タイトル
    out_ws.cell(row=row_offset, column=2).value = modified  # 修正日時
    for r, row in enumerate(ws['B34:L48']):  # B34 ~ L48 の値
        for c, cell in enumerate(row):
            out_ws.cell(row=row_offset + r, column=3 + c).value  = cell.value
    row_offset += 15

# 保存する。
out_wb.save('output.xlsx')

Accepted Answer

## 質問欄にコードを貼る方法について

コードを貼るときは、コードブロックで囲むようにしてください。
そうしないとインデントが崩れてしまいます。
質問欄はあとからでも編集できます。

![イメージ説明](59662c945089f19da5a57aa39b3cb469.png)

## ファイル名から日付の抽出

文字列から一部分を抽出する場合は「正規表現を作成する」問題に帰着します。
正規表現についてわからない場合は便利なので、勉強してみるとよいかと思います。

[サルにもわかる正規表現入門](https://www.mnet.ne.jp/~nakama/)

以下に質問欄にあるファイル名から正規表現で年月日を抽出して、文字列で返すサンプルコードを貼ります。

```
import re

def get_title(filename):
    # 「テキスト日報20140715」のパターンを抽出
    matches = re.match(r'^.+日報(\d{4})(\d{2})(\d{2}).xlxs$', filename)
    if matches:
        year, month, date = matches.groups()
        return '{:0>4}/{:0>2}/{:0>2}'.format(year, month, date)

    # 「2017.6.6テキスト日報」のパターンを抽出
    matches = re.match(r'^(\d{4}).(\d{1,}).(\d{1,})[^\d].+日報.xlxs$', filename)
    if matches:
        year, month, date = matches.groups()
        return '{:0>4}/{:0>2}/{:0>2}'.format(year, month, date)

    raise ValueError('Failed to parse specified string.')
    
# テスト (この部分はテスト用なのでコピペしなくていいです。)(
assert get_title('2017.6.6テキスト日報.xlxs') == '2017/06/06'
assert get_title('2017.09.12テキスト日報.xlxs') == '2017/09/12'
assert get_title('2017.06.06テキスト日報.xlxs') == '2017/06/06'
assert get_title('テキスト日報20140715.xlxs') == '2014/07/15'
```

以下の部分を変更して、使ってみてください。

変更前
```
matches = re.search(r'日報(\d{1,2}).(\d{1,2})', os.path.basename(path))
month, date = matches.groups()
title = '{}/{}'.format(month, date)
```

変更後
```
title = get_title(os.path.basename(path))
```

エクセルファイル一覧を取得する。

ヘッダー

保存する。

質問欄にコードを貼る方法について

ファイル名から日付の抽出

関連した質問