Skip to content

dimitryslavin/edgar-10k-sa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda


edgar-10k-sa


Section I. downlaod & extract mda from edgar 10k forms

To see full command: python crawl10k.py -h

  1. Class FormIndex: - First we download the full indexes with year range(urls of form10k files) - Save to csv file

  2. Class Form: - We download with http requests(edgar closed ftp service since 2017) with previously downloaded form indices

- The 10k are stored in html format, so use BeautifulSoup to parse the raw html and also preprocess text for easier MDA finding
- Save to txt dir in 'filename.txt'
  1. Class MDAParser: - Try to extract MDA section from preprocessed text - Save file to mda dir in 'filename.mda' - Save parsing results to 'parsing.log', shows SUCCESS/FAILURE of each file

II. Sentiment Analysis with Bill McDonald's Code (Code can be found at http://sraf.nd.edu/textual-analysis/)

  1. Specify mda files, dictionary file & result csv file in Generic_Parser.py
  2. run 'python Generic_Parser.py'
  3. Code has been modified to add CIK for this repo(CIK is included in filename in the first section)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%