AKA: Nobody knows what “Oscillation People” are talking about
The way we talk about science is really important. If someone has a good idea, he or she has to be able to communicate it to other people. This is critically affected by the words that we choose to use. If you present at a neuroscience conference, you write an abstract that shows up with everybody else’s in the program. The goal of your abstract is to explain the gist of what you're working on and why it matters to someone who studies something different. Poster abstracts afford a great opportunity to see how science is communicated. This is especially important in neuroscience where people study the many different but highly interrelated features of brains. I presented some of my work at the Cognitive Neuroscience Society meeting a few months ago and when I got there I was handed a couple-hundred page program with information about all of the the talks and posters that would be there. I was curious how much information I could gather just from the words that were used in the abstracts so I started playing around with it.
In this post I’m going to walk through my process of getting CNS poster abstract data from the internet, how I analyzed it, and what this kind of analysis can tell us. Along the way we will laugh. we will cry. And we will learn some data science.
Here’s what I did:
Scraping and cleaning the data.
This part is for nerds. Skip ahead to the pictures for the interesting stuff. First, I downloaded the program for the CNS 2015 annual meeting from here: http://www.cogneurosociety.org/wordpress2015/wp-content/uploads/2015/03/CNS_2015_Program.pdf
This is a PDF file. PDF files are assholes.
I used a python module, PDFMiner's, PDF2Txt.py function to pull all of the text from the pdf file and put it in a text file.
I used a python module, PDFMiner's, PDF2Txt.py function to pull all of the text from the pdf file and put it in a text file.
Then I loaded the text file into python and took a look. I wanted the text from the abstracts but there was tons of irrelevant information I didn’t want like glossaries, announcements, and advertisements in there as well. Furthermore, there were all sorts of line breaks and hyphens that randomly split some of the words that I did want. And for some reason that I can’t explain, every time there was a ‘fi’ or ‘fl’ in the text, PDF2Txt injected a space in the middle of the word. It was gross. I found where in the text the posters started and deleted everything outside that range. Then I used the titles of the poster sessions (A1 - G131) to index the beginning of an abstract and some characteristic whitespace to find the end. I looped through all of the posters and saved each poster to its own text file for safekeeping. I also had to remove some page numbers and the helpful “COGNITIVE NEUROSCIENCE SOCIETY” footer sporadically littered throughout the text. Those were dark times. Some good data was lost in the process. But I came out of the cleaning process with 902 of the 912 conferences abstracts mostly cleaned up.
Here’s what one of the abstracts looked like after my process. Note that weird ‘fi’/’fl’ thing:
Here’s what one of the abstracts looked like after my process. Note that weird ‘fi’/’fl’ thing:
G101 SPECTRAL WHITENING INFLUENCES OSCILLATORY DYNAMICS AND
BEHAVIOR IN HUMANS Torben Noto1 , Bradley Voytek2 ; 1 UCSD, 2 UCSD —
Neural oscillations play a critical role in many brain functions, including
perception, memory, executive functioning, and emotion. The scale over
which neural oscillations operate spans the microscale local fi eld potential
(LFP) of a local neuronal population to the macroscale electrocorticogram
(ECoG) and electroencephalogram (EEG). Oscillations have proved to be
a fundamental component of neural communication and network coordination,
perception, memory, executive functioning, and emotion. The scale over
which neural oscillations operate spans the microscale local fi eld potential
(LFP) of a local neuronal population to the macroscale electrocorticogram
(ECoG) and electroencephalogram (EEG). Oscillations have proved to be
a fundamental component of neural communication and network coordination,
putatively through their interactions with local population spiking
activity via spike/fi eld or phase/amplitude coupling (PAC). Importantly,
population spiking and the oscillatory frequency of a neuronal population
can be different from the spiking frequency of individual neurons in the
neural region generating the oscillation, however the interrelationship
between oscillatory coupling and PAC remains unclear. Across several
datasets, collected from multiple investigators—ranging from whole-cell
patch clamp with concurrent LFP to human subdural ECoG—we fi nd that
temporally de-correlated spiking activity: 1) Is associated with a “fl attening”
activity via spike/fi eld or phase/amplitude coupling (PAC). Importantly,
population spiking and the oscillatory frequency of a neuronal population
can be different from the spiking frequency of individual neurons in the
neural region generating the oscillation, however the interrelationship
between oscillatory coupling and PAC remains unclear. Across several
datasets, collected from multiple investigators—ranging from whole-cell
patch clamp with concurrent LFP to human subdural ECoG—we fi nd that
temporally de-correlated spiking activity: 1) Is associated with a “fl attening”
(whitening) of the LFP and ECoG power spectral density; 2) Reduces
phase amplitude coupling; and, 3) Biases perceptual behaviors such that
ongoing shifts in the spectral shape predict trial-by-trial response times.
These results are supported by computational modeling and single-unit
in vivo patch-clamp and LFP data. Thus, we have outlined a pathway by
which spiking connects the biology of oscillatory mechanics to complex
behaviors in humans.
phase amplitude coupling; and, 3) Biases perceptual behaviors such that
ongoing shifts in the spectral shape predict trial-by-trial response times.
These results are supported by computational modeling and single-unit
in vivo patch-clamp and LFP data. Thus, we have outlined a pathway by
which spiking connects the biology of oscillatory mechanics to complex
behaviors in humans.
Whatever. Good enough.
Now it was time for some simple analyses:
What words were most used to describe the posters?
Well most words don’t mean much like “the” and “of” so I decided that I would limit my text analysis to more neuroscience-y words that might better characterize the interesting aspects of the abstracts. Luckily the group at NeuroSynth compiled a list of 3407 useful neuroscience words like 'attention' and 'dopamine'. I took each abstract and counted each time a word from this bank was used and put it into dictionaries. Now all of the posters looked like this: (Thanks http://neurosynth.org/)
Well most words don’t mean much like “the” and “of” so I decided that I would limit my text analysis to more neuroscience-y words that might better characterize the interesting aspects of the abstracts. Luckily the group at NeuroSynth compiled a list of 3407 useful neuroscience words like 'attention' and 'dopamine'. I took each abstract and counted each time a word from this bank was used and put it into dictionaries. Now all of the posters looked like this: (Thanks http://neurosynth.org/)
{'SUPPORTED': 1, 'FUNCTIONS': 1, 'PERCEPTION': 1, 'MULTIPLE': 1, 'NETWORK': 1, 'INDIVIDUAL': 1, 'INTERACTIONS': 1, 'HUMANS': 2, 'GENERATING': 1, 'COLLECTED': 1, 'SCALE': 1, 'ONGOING': 1, 'FUNDAMENTAL': 1, 'OSCILLATIONS': 3, 'EEG': 1, 'DYNAMICS': 1, 'POPULATION': 4, 'SPECTRAL': 3, 'UNCLEAR': 1, 'COMPLEX': 1, 'COUPLING': 3, 'POWER': 1, 'PATHWAY': 1, 'EXECUTIVE': 1, 'COORDINATION': 1, 'SHAPE': 1, 'CRITICAL': 1, 'BEHAVIOR': 1, 'PHASE': 1, 'IMPORTANTLY': 1, 'FUNCTIONING': 1, 'BEHAVIORS': 2, 'PLAY': 1, 'NEURONAL': 2, 'DENSITY': 1, 'REDUCES': 1, 'HUMAN': 1, 'TIMES': 1, 'POTENTIAL': 1, 'NEURONS': 1, 'MODELING': 1, 'LOCAL': 3, 'RESPONSE': 1, 'EMOTION': 1, 'AMPLITUDE': 1, 'COMPUTATIONAL': 1, 'CONCURRENT': 1, 'PREDICT': 1, 'COMMUNICATION': 1, 'COMPONENT': 1, 'TEMPORALLY': 1, 'VIVO': 1, 'SHIFTS': 1, 'FREQUENCY': 2, 'ROLE': 1, 'INFLUENCES': 1, 'MEMORY': 1, 'PERCEPTUAL': 1, 'BIASES': 1}
(This is what that transformation did to the same poster as above)
Using this format let me sum all of the mentions of certain words together across all 902 posters, I found that these were the most commonly used words:
Again, I analyzed 902 abstracts. Memory is mentioned 1189 times. On average, that’s more than once per abstract!
To prove to myself that this wasn't a bogus finding caused by some outlier or problem with my cleaning process, I plotted how often each abstract used the word "memory".
It looks pretty reasonable. Some abstracts mention memory a whole lot and some don’t mention it at all. Also, the posters here are in plotted in order of poster session. And the poster sessions each have a topic. Its cool that you can easily guess which sessions were about memory and which weren't.
This abstract used the word “memory” 16 times!
TRANSCRANIAL DIRECT CURRENT STIMULATION IMPROVES
ASSOCIATIVE MEMORY IN INDIVIDUALS WITH DEPRESSION.
Cheryl Abellanoza1, James Schaeffer1, Heekyeong Park1; 1University of Texas
- Arlington — The dorsolateral prefrontal cortex (DLPFC) is important for
both working memory and long-term memory, such that DLPFC activity
promotes associative memory by forming relational processing between
items during on-line processing. fMRI studies have shown that increased
DLPFC activity during encoding relates to successful associative memory
in normal controls. Neuropsychological patients, including depression
patients, show disorders in DLPFC activity, along with impairments in
associative memory. Transcranial direct current stimulation (tDCS) is
a noninvasive, safe, and cost-effective form of brain stimulation that is a
useful tool for examining the causal relationship between brain areas and
cognitive functions. The present study investigated whether tDCS of the
left DLPFC would enhance associative memory in individuals with depressive
ASSOCIATIVE MEMORY IN INDIVIDUALS WITH DEPRESSION.
Cheryl Abellanoza1, James Schaeffer1, Heekyeong Park1; 1University of Texas
- Arlington — The dorsolateral prefrontal cortex (DLPFC) is important for
both working memory and long-term memory, such that DLPFC activity
promotes associative memory by forming relational processing between
items during on-line processing. fMRI studies have shown that increased
DLPFC activity during encoding relates to successful associative memory
in normal controls. Neuropsychological patients, including depression
patients, show disorders in DLPFC activity, along with impairments in
associative memory. Transcranial direct current stimulation (tDCS) is
a noninvasive, safe, and cost-effective form of brain stimulation that is a
useful tool for examining the causal relationship between brain areas and
cognitive functions. The present study investigated whether tDCS of the
left DLPFC would enhance associative memory in individuals with depressive
symptoms. Subjects (depression, control) engaged in a double-blind,
two-session (anodal, sham) study where they studied items and
two-session (anodal, sham) study where they studied items and
completed item and associative memory tests. tDCS was administered prior
to memory tasks. For item memory test, subjects studied a list of items
and made “old/new” recognition judgments with confi dence ratings. For
associative memory test, subjects studied word pairs and indicated if test
pairs were studied in the same pairing at study (“intact”), studied but with
different pairings (“rearranged”), or not studied (“new”). Results showed
that only the depression group showed enhanced associative memory after
anodal tDCS administration. However, such memory enhancement effects
due to tDCS were not found in item memory. Control subjects did not show
any difference due to tDCS. These fi ndings demonstrate the role of DLPFC
in associative memory and the nature of memory defi cits in depression.
to memory tasks. For item memory test, subjects studied a list of items
and made “old/new” recognition judgments with confi dence ratings. For
associative memory test, subjects studied word pairs and indicated if test
pairs were studied in the same pairing at study (“intact”), studied but with
different pairings (“rearranged”), or not studied (“new”). Results showed
that only the depression group showed enhanced associative memory after
anodal tDCS administration. However, such memory enhancement effects
due to tDCS were not found in item memory. Control subjects did not show
any difference due to tDCS. These fi ndings demonstrate the role of DLPFC
in associative memory and the nature of memory defi cits in depression.
Ugh. Ok. We get it. This is about memory.
Anyways, let's move on. Here's another interesting thing that I found:
I used a quick clustering algorithm to group abstracts by the words that were used. I found some categories that resemble the fields of neuroscience...kind of. The algorithm I used doesn't name the clusters so I just made those up myself.
Topic
|
Key terms:
|
Topic 1: Oscillation things
|
STIMULATION MOTOR ALPHA THETA EEG POWER PHASE ACTION OSCILLATIONS HZ
|
Topic 2: Language and Speaking
|
LANGUAGE SENTENCES ENGLISH SENTENCE COMPREHENSION READING BILINGUALS HEMISPHERE NATIVE SPEAKERS
|
Topic 3: ERPology
|
ERP ERPS EFFECT EFFECTS RESPONSE AMPLITUDE POTENTIALS CONDITION LARGER EARLY
|
Topic 4: Memory and Hippocampus
|
MEMORY RETRIEVAL ENCODING WORKING EPISODIC RECOGNITION HIPPOCAMPUS HIPPOCAMPAL INFORMATION WM
|
Topic 5: Actual Cognitive Neuroscience
|
COGNITIVE TASK ET AL USING TASKS NEUROSCIENCE PERFORMANCE RESEARCH INDIVIDUAL
|
Topic 6: Reward or something?
|
TASK INFORMATION REWARD CONTROL RESPONSE COGNITIVE CUES TRIALS MODEL PREDICTION
|
Topic 7: Learning (and sleep?)
|
LEARNING TRAINING SLEEP LEARNED GROUP NOVEL SESSION PERFORMANCE TEST NEW
|
Topic 8: Old People and Young People
|
ADULTS AGE OLDER CHILDREN PERFORMANCE YOUNG FUNCTION YEARS DEVELOPMENT ABILITY
|
Topic 9: Vision and Attention
|
VISUAL ATTENTION STIMULI LOCATION OBJECTS SPATIAL OBJECT PRESENTED TARGET REPRESENTATIONS
|
Topic 10: Emotional Emotions
|
EMOTIONAL EMOTION STRESS NEUTRAL NEGATIVE STIMULI ANXIETY AFFECTIVE COGNITIVE POSITIVE
|
Topic 11: Cortical Networks
|
NETWORK CONNECTIVITY FRONTAL NETWORKS GYRUS PREFRONTAL INFERIOR ANTERIOR TEMPORAL PARIETAL
|
Topic 12: Words
|
WORDS WORD SEMANTIC EFFECT EFFECTS PRESENTED INCONGRUENT RECOGNITION PRIMING LEXICAL
|
Topic 13: Faces
|
FACES SOCIAL FACE ASD PERCEPTION FACIAL INDIVIDUALS STIMULI PRESENTED AUTISM
|
Topic 14: Brain Structure
|
CONTROLS MATTER COGNITIVE VOLUME HEALTHY WHITE CORTICAL INDIVIDUALS DISEASE STRUCTURAL
|
Topic 15: Hearing
|
AUDITORY SPEECH FEEDBACK MUSIC PERCEPTION PITCH TEMPORAL SOUNDS PERCEPTUAL FREQUENCY
|
Torben. What the hell am I looking at?
Good question, dear reader. I used Latent Dirichlet Allocation (LDA) https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation to cluster all of the abstracts into categories based on their content. Basically, I told this algorithm to split the abstracts into 15 groups. Fifteen being a completely arbitrary number. The groups were decided based on how specific certain words are to certain categories. I displayed the 10 words that most characterized each group, the first word being the most characteristic.
It is important to point out that LDA is sensitive to the parameters that I used behind the scenes. For example, I ran the same analysis with different numbers of categories to see how that would affect the output. It shuffled and mixed these categories around a little bit but the gist was the same.
Looks like my paper fits into the cognitive neuroscience, hearing(?) and oscillation categories. I thought my paper would be more in the Oscillation category but I used a lot of 'Cognitive' words and words like 'perception' and 'auditory' from the Hearing category so it makes sense that my paper would fall into a couple categories.
What does all of this tell us?
Well first of all it is pretty cool that the categories that I found mostly reflect the submission categories for CNS. You can clearly see the canonical fields of study like memory and vision reflected by these clusters. But there's more here. I think it's a funny easter egg that ET and AL showed up in the Cognitive category. For those of you who don't know, you put "et al." in your byline if you have many authors. This suggests that much larger research groups are working on these kinds of problems. Hmm. Weird.
One idea from this analysis is that maybe conferences should use LDA from the previous year to decide what categories they should use for poster sessions. For example, there are people really interested in the structure of the brain but there wasn’t a specific session for only that. It would be cool if the structure of science could update itself in real time.
But what is super, super interesting is that the people studying oscillations are the first group. This means that the vocabulary they use is the most specific. (The distribution of topics has a Dirichlet prior meaning it tends to have a log-normal-ly looking distribution). It’s literally like they are speaking in a different language than everybody else! This means that very few people in other fields are talking about things like oscillations and the power spectrum, at least in their abstracts, with regard to their more specific field of study. My work was one of the few posters that partly fell into the oscillation category and in my anecdotal experience, I only connected with a few other people interested in these kinds of things at this conference.
The Upshot
The lesson here is that it’s often helpful to take a step back and look at the big picture. Science happens in a social and linguistic context and not enough scientists who participate in this system recognize this fact. Doing meta-analyses of our process can show us things that aren't evident at smaller scales. In this case, scientists seemed more interested in memory than cognition at this 'cognitive neuroscience' conference and the few people were using an 'oscillation' vocabulary in their work. It would behoove us all to be mindful of how we structure our science, communicate our ideas, and listen to other perspectives.
If you're interested, the cleaned abstract metadata and some the interesting parts of the code can be found here: https://github.com/torbenator/cns_analysis
If you're interested, the cleaned abstract metadata and some the interesting parts of the code can be found here: https://github.com/torbenator/cns_analysis
No comments:
Post a Comment