🗂️ DATA
Overview
This project extracts, structures, and visualizes data from six African American tourism brochures produced by the Atlantic City Board of Trade (1936–1945). These materials are digitized artifacts of Black leisure and business during segregation—and the data drawn from them reflects not only what’s visible, but also what’s missing or contested.
“Owing to the rapid progress of the present civilized world along all lines, and more particularly along commercial lines, and since our group has kept pace with this upward progress both in the Professional and Business fields, we, the citizens of the Northside of Atlantic City, New Jersey, deem it fitting and proper to form an Organization through and by which, we hope to be able to meet the Commercial needs of the inhabitants of our community, and stimulate among them greater interest for cooperation and efficiency in their chosen Professions or Businesses and along all Commercial lines.”
— C.J. Newsome, Negro History Bulletin, 1952 —
- Years Covered: 1936, 1939, 1941, 1942, 1944, 1945
- Origin: Physical brochures held in the Atlantic City Free Public Library
- Scanned Formats: High-resolution PNGs and PDFs, each page representing two 4.5” × 9” brochure columns.
📊 Data Types Collected
| Field Name | Description |
|---|---|
Brochure | Year or ID of the source document |
Page | Brochure page number |
Section | Left/right column or subsection |
Name, Title | Individual names and honorifics (e.g., Prop., Proprietor) |
Business | Business name |
Address, City, State | Parsed from OCR; often inferred from nearby text |
Phone | Phone number (if present) |
Industry | Labeled by human and/or inferred from business type |
Image, Image2 | Associated cropped image files from original page |
Latitude, Longitude | Manually or semi-automatically geocoded |
Details | Notes on services, slogans, or other distinguishing info |
Section Type | Labeled as: advertisement, profile, letter, listing, image, etc. |
🧠 Data Extraction & Cleaning
- OCR: Processed using Tesseract, with custom image preprocessing for enhanced accuracy.
- Regex Parsing: Heuristics segment multi-line entries and detect structured patterns (name/title/address/phone).
- Manual Review: Every entry was reviewed and corrected, with additional metadata added where relevant.
- Duplicate Detection: Names and businesses appearing across multiple years are linked for longitudinal analysis.
🗺️ Geospatial Data
- Addresses geocoded using QGIS, historical city directories, and 1940s fire insurance maps.
- Where addresses are incomplete or missing, entries are tagged as approximate or grouped in contextual image galleries (e.g., “Boardwalk businesses”).
- A 1940 historical basemap provides visual grounding in the city’s past layout.
⚖️ Data Ethics
- No facial recognition, identity inference, or algorithmic scoring applied.
- OCR errors are preserved in full-text files to maintain historical integrity.
- All uncertain or missing values are marked explicitly (e.g.,
unknown,unreadable,inferred). - Image crops are stored with full source attribution, including page and year metadata.
📥 Downloads
Available soon via GitHub or project archive:
business_entries.csv– Structured text data with named fieldsimage_metadata.csv– Metadata for all cropped images, with source infoocr_full_text.zip– Full-page OCR output from all brochuresmap_markers.geojson– GeoJSON file for mapping businessesREADME.md– Schema and documentation for all dataset fields
These datasets are provided for historical, educational, and research purposes only.
Please cite as:
“Data derived from Atlantic City Board of Trade brochures (1936–1945), Atlantic City Board of Trade.”