A simple little application popped up recently for some of Python's amazing regular expressions tools that I thought I would share.
Like many academics, I use BibTeX to manage citations in papers, which makes adding and citing references a breeze. Even better, many web sources, including Google Scholar, automatically generate the BibTeX code for you, which you can paste directly into your .bib file. For a s̶h̶a̶m̶e̶l̶e̶s̶s̶ ̶p̶l̶u̶g̶ relevant example, the automatically generated BibTeX code for "Rare Shocks, Great Recessions" is: @article{curdia2014rare, title={Rare shocks, great recessions}, author={C{\'u}rdia, Vasco and Negro, Marco and Greenwald, Daniel L}, journal={Journal of Applied Econometrics}, volume={29}, number={7}, pages={1031--1052}, year={2014}, publisher={Wiley Online Library} } Everything is perfect except for one tiny thing...the title isn't capitalized! I prefer "Rare Shocks, Great Recessions." Basically all automatically generated BibTeX entries don't capitalize, so if I wanted to change this for every title in my bibliography it would mean a ton of manual editing. Instead, I wrote a little Python script that takes care of this using regular expressions and the titlecase module. Enjoy! """This script replaces lowercase titles in BibTeX entries with titlecase e.g., 'The greatest title of the all time' to 'The Greatest Title of All Time'.""" import re from titlecase import titlecase # Set path and name of bib files directory = '/path/to/my/file/' my_file = 'my_bibfile.bib' new_file = 'new_bibfile.bib' # in case you don't want to overwrite # Match title or journal segment allowing for whitespace (UPDATED 8/12/19) pattern = re.compile(r'(\W*)(title|journal)\s*=\s*{(.*)},') # Read in old file with open(directory + my_file, 'r') as fid: lines = fid.readlines() # Search for title strings and replace with titlecase newlines = [] for line in lines: # Check if line contains title match_obj = pattern.match(line) if match_obj is not None: # Need to "escape" any special chars to avoid misinterpreting them in the regular expression. oldtitle = re.escape(match_obj.group(3)) # Apply titlecase to get the correct title. newtitle = titlecase(match_obj.group(3)) # Replace and add to list p_title = re.compile(oldtitle) newline = p_title.sub(newtitle, line) newlines.append(newline) else: # If not title, add as is. newlines.append(line) # Print output to new file with open (directory + new_file, 'w') as fid: fid.writelines(newlines)
2 Comments
7/7/2016 12:19:22 pm
Great script! I modified it slightly to deal with both title and book title fields, with a slightly more flexible re pattern:
Reply
2/16/2017 12:52:55 am
This article we are needed for capitalizing titles in bibtex and misc. So everyone happy to have your great reviews and blog updates. Then every user happy to find your great articles.
Reply
Leave a Reply. |