Updated: 10/21/2019 ------------ The data/materials of the LAP Web site are provided to users at the cost of reproduction (at no cost for materials downloaded from the LAP Web site). Users may copy, display, or give away the data/materials. Users should not resell them in whole or in part, because they are governed by the "cost of reproduction" condition. Users should indicate in any subsequent publication using LAP data/materials that they have obtained the data/materials from the LAP Web site, and indicate that reproduction, copying, distribution, display, etc., as the case may be, of any LAP data/ materials is governed by the "cost of reproduction" condition. Any use of the DASS audio, transcriptions, or annotations should be cited as: William A. Kretzschmar, Jr., Margaret E. L. Renwick, Lisa M. Lipani, Michael L. Olsen, Rachel M. Olsen, Yuanming Shi, and Joseph A. Stanley (2019) Transcriptions of the Digital Archive of Southern Speech. Linguistic Atlas Project, University of Georgia. http://www.lap.uga.edu/Projects/DASS2019/ Additional useful sources: http://lap3.libs.uga.edu/u/jstanley/vowelcharts/ Special thanks to Chris Koops at the University of New Mexico for feedback on our finalized data set. ------------ This is the full transcribed Digital Archive of Southern Speech (DASS) in .xml, .txt, and .TextGrid format as transcribed by the Linguistic Atlas Project at the University of Georgia, funded by a grant from the National Science Foundation (Project Title: Automated Large-Scale Phonetic Analysis: DASS Pilot; #1625680, PIs Dr. William Kretzschmar and Dr. Margaret E. L. Renwick). DASS is comprised of 64 speakers, and is a subset of the Linguistic Atlas of the Gulf States (LAGS). Speakers are presented here with their LAGS-assigned informant numbers. Five of the original DASS interviews were too difficult to transcribe due to poor audio quality, and we have thus replaced them with five LAGS speakers chosen to closely match the original DASS speakers for social characteristics. The five speakers included here that were not originally included in DASS are: 030 (replaced LAGS speaker 052), 255 (replaced LAGS speaker 263), 370B (replaced LAGS speaker 356), 596 (replaced LAGS speaker 609A), 811 (replaced LAGS speaker 802). Each speaker's interview is split into reels (a relic of the original recording of interviews on reel-to-reel and cassette tapes), and each DASS interview contains anywhere between 2-13 reels. Files are labeled as follows: LAGSspeakernumber_reelnumber.txt, LAGSspeakernumber_reelnumber.TextGried, and LAGSspeakernumber_reelnumber.xml. This corpus also features full .txt and .xml versions of each speaker's inteviews: LAGSspeakernumber_full.txt and LAGSspeakernumber_full.xml, as well as documents with all of the DASS speakers' interviews: DASS_full.txt and DASS_full.xml. The .txt files read as transcripts of the interviews. Speech contained within pound signs (# #) indicates overlapping speech. The .xml files are time-stamped and correspond to the audio. They can be opened with the program Transcriber (which outputs .trs files, a kind of .xml file), as well as a file-editing program like Notepad++. Although we have presented full speaker and DASS .xml files, it should be noted that the time stamps correspond to individual reel audio. So to open an .xml file in concordance with an audio file, it's necessary to use an individual reel's .xml (e.g. use 025_1.xml, not 025_full.xml or DASS_full.xml). When opened in a file-editor like Notepad++, there is information at the top of each file about the speakers and the file itself. This introductory information remains at the top of the start of each reel's .xml file even in each speaker's full concatenated interview .xml, and in the full DASS .xml. There are 409 reels in DASS, and there is thus such information 409 times in the DASS_full.xml file. This introductory information is particular to each reel. This corpus also features force-aligned TextGrids for each speaker's reels, produced by the Montreal Forced Aligner. Only the speaker's (i.e., not the interviewer's) speech is force-aligned. A set of codes, always in curly brackets { }, are used throughout .xml and .txt transcriptions as follows: {X} unintelligible {D:} doubt - the transcriber thinks they may have understood what was said, but isn't sure (e.g. {D: painted fence} {C:} comment (e.g. {C: informant is making a point of pronouncing this word in a certain way, but doesn't actually pronounce it this way} {NS} non-speech (e.g. phone ringing, dog barking, door closing} {NW} non-word (e.g. cough, laugh, sneeze} In http://www.lap.uga.edu/Projects/DASS2019/, there are three folders: Bios LAGS biography of each speaker Info DASS_demographic_information.xlsx This excel file contains: LAGSspeakernumber, sector# (assigned by original LAGS project), grid unit, county, town, state, sex, age at time of interview, age level, social status, education, ethnicity, worldview (assigned by original LAGS interviewers), land region, DASS type, LAGS biography. A legend for each category is presented in the heading of the file itself. DASS_narratives_topics.xlsx This excel file presents information about the contents of each reel. It contains information about the topics discussed on each reel, and whether the conversation features a longer narrative by the informant (at least 1.5 minutes), questions and answers, or the interviewer speakings. Included are the LAGSspeakernumber, reel, part of the reel, narrative/Q&A/I, and topic label. The part of the reel corresponds to .mp3 audio available on the website of the Linguistic Atlas Project (http://www.lap.uga.edu/Projects/LAGS/). Speakers Each individual speaker folder, labeled according to LAGSspeakernumber, contains the following: Audio .wav files for each reel for that speaker Text .txt interviews for each reel _full.txt full speaker interview LAGS bio for speaker TextGrid .TextGrids for each reel xml .xml interviews for each reel .xml for full speaker interview