stackoverflow November 29, 2025 tech Rep: 21

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

Score

Answers

105

Views

19.5

Trend Score

Question Details

No question body available.

Answers (2)

January 1, 2026 Score: 0 Rep: 71,727 Quality: Medium Completeness: 80%

The solution below traverses the page by anchoring the search on the h2.mw-headline elements (which exclude the main introductory text), and then checking if the following sibling element is storing the memory names and descriptions:

import requests, bs4
from bs4 import BeautifulSoup as soup
import csv
def getdata():
    d = requests.get('https://deadbydaylight.fandom.com/wiki/Tome1-Awakening').text
    page = soup(d, 'html.parser')
    vals = [i for i in page.selectone('.mw-content-ltr.mw-parser-output').children if not isinstance(i, bs4.NavigableString)] #get all body content
    results = []
    while vals:
        node = vals.pop(0)
        if node.name == 'h2' and node.selectone('span.mw-headline'):
            if vals and getattr(node2:=vals.pop(0), 'attrs', {}).get('class',[]) == ['tabber', 'wds-tabber']: #lookahead to see if subsequent sibling stores both tabs and content
                results.append([node.gettext(strip=True), ''])
                tabs = [i.gettext(strip=True) for i in node2.select('ul.wds-tabs > li.wds-tabstab')]
                text = [i.gettext(strip=True) for i in node2.select('div.wds-tabcontent i')]
                results.extend([*map(list, zip(tabs, text))])    with open('memorytitles.csv', 'w') as f:
        write = csv.writer(f)
        write.writerows(results)

You can see the contents of memory_titles.csv here.

November 29, 2025 Score: 0 Rep: 1 Quality: Low Completeness: 40%

I would recommend using the mediawiki api which is supported by fandom to do this task. Perhaps wikitext is easier to parse than html for this usecase you're talking about. I can't really tell what opperation you're trying to do but whatever it is it's probably easier to do it on the markdown format of wikitext instead of plain html.

https://deadbydaylight.fandom.com/api.php?action=parse&page=Tome1-Awakening&prop=wikitext&formatversion=2

I got the api endpoint above from reading this page (method 3) on the mediawiki api documentation https://www.mediawiki.org/wiki/API:Getthecontentsofapage

Export Question Data

Export this question and its answers for further analysis or reporting.

Back to Questions

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

Question Details

Tags

Answers (2)

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data