Thursday, September 17, 2015

The 2014 CNS Annual meeting should be called: ‘The Meeting for Memory and Also Some Other Things’

AKA: Nobody knows what “Oscillation People” are talking about



The way we talk about science is really important. If someone has a good idea, he or she has to be able to communicate it to other people. This is critically affected by the words that we choose to use. If you present at a neuroscience conference, you write an abstract that shows up with everybody else’s in the program. The goal of your abstract is to explain the gist of what you're working on and why it matters to someone who studies something different. Poster abstracts afford a great opportunity to see how science is communicated. This is especially important in neuroscience where people study the many different but highly interrelated features of brains. I presented some of my work at the Cognitive Neuroscience Society meeting a few months ago and when I got there I was handed a couple-hundred page program with information about all of the the talks and posters that would be there. I was curious how much information I could gather just from the words that were used in the abstracts so I started playing around with it.


In this post I’m going to walk through my process of getting CNS poster abstract data from the internet, how I analyzed it, and what this kind of analysis can tell us. Along the way we will laugh. we will cry. And we will learn some data science.

Here’s what I did:


Scraping and cleaning the data.

This part is for nerds. Skip ahead to the pictures for the interesting stuff. First, I downloaded the program for the CNS 2015 annual meeting from here: http://www.cogneurosociety.org/wordpress2015/wp-content/uploads/2015/03/CNS_2015_Program.pdf

This is a PDF file. PDF files are assholes.

I used a python module, PDFMiner's, PDF2Txt.py function to pull all of the text from the pdf file and put it in a text file.
Then I loaded the text file into python and took a look. I wanted the text from the abstracts but there was tons of irrelevant information I didn’t want like glossaries, announcements, and advertisements in there as well. Furthermore, there were all sorts of line breaks and hyphens that randomly split some of the words that I did want. And for some reason that I can’t explain, every time there was a ‘fi’ or ‘fl’ in the text, PDF2Txt injected a space in the middle of the word. It was gross. I found where in the text the posters started and deleted everything outside that range. Then I used the titles of the poster sessions (A1 - G131) to index the beginning of an abstract and some characteristic whitespace to find the end. I looped through all of the posters and saved each poster to its own text file for safekeeping. I also had to remove some page numbers and the helpful “COGNITIVE NEUROSCIENCE SOCIETY” footer sporadically littered throughout the text. Those were dark times. Some good data was lost in the process. But I came out of the cleaning process with 902 of the 912 conferences abstracts mostly cleaned up.

Here’s what one of the abstracts looked like after my process. Note that weird ‘fi’/’fl’ thing:

G101 SPECTRAL WHITENING INFLUENCES OSCILLATORY DYNAMICS AND 
BEHAVIOR IN HUMANS Torben Noto1 , Bradley Voytek2 ; 1 UCSD, 2 UCSD —
Neural oscillations play a critical role in many brain functions, including
perception, memory, executive functioning, and emotion. The scale over
which neural oscillations operate spans the microscale local fi eld potential
(LFP) of a local neuronal population to the macroscale electrocorticogram
(ECoG) and electroencephalogram (EEG). Oscillations have proved to be
a fundamental component of neural communication and network coordination, 
putatively through their interactions with local population spiking
activity via spike/fi eld or phase/amplitude coupling (PAC). Importantly,
population spiking and the oscillatory frequency of a neuronal population
can be different from the spiking frequency of individual neurons in the
neural  region  generating  the  oscillation,  however  the  interrelationship
between  oscillatory  coupling  and  PAC  remains  unclear.  Across  several
datasets,  collected  from  multiple  investigators—ranging  from  whole-cell
patch clamp with concurrent LFP to human subdural ECoG—we fi nd that
temporally de-correlated spiking activity: 1) Is associated with a “fl attening” 
(whitening) of the LFP and ECoG power spectral density; 2) Reduces
phase amplitude coupling; and, 3) Biases perceptual behaviors such that
ongoing  shifts  in  the  spectral  shape  predict  trial-by-trial  response  times.
These  results  are  supported  by  computational  modeling  and  single-unit
in vivo patch-clamp and LFP data. Thus, we have outlined a pathway by
which  spiking  connects  the  biology  of  oscillatory  mechanics  to  complex
behaviors in humans.


Whatever. Good enough.


Now it was time for some simple analyses:

What words were most used to describe the posters? 

Well most words don’t mean much like “the” and “of” so I decided that I would limit my text analysis to more neuroscience-y words that might better characterize the interesting aspects of the abstracts. Luckily the group at NeuroSynth compiled a list of 3407 useful neuroscience words like 'attention' and 'dopamine'. I took each abstract and counted each time a word from this bank was used and put it into dictionaries. Now all of the posters looked like this: (Thanks http://neurosynth.org/)

{'SUPPORTED': 1, 'FUNCTIONS': 1, 'PERCEPTION': 1, 'MULTIPLE': 1, 'NETWORK': 1, 'INDIVIDUAL': 1, 'INTERACTIONS': 1, 'HUMANS': 2, 'GENERATING': 1, 'COLLECTED': 1, 'SCALE': 1, 'ONGOING': 1, 'FUNDAMENTAL': 1, 'OSCILLATIONS': 3, 'EEG': 1, 'DYNAMICS': 1, 'POPULATION': 4, 'SPECTRAL': 3, 'UNCLEAR': 1, 'COMPLEX': 1, 'COUPLING': 3, 'POWER': 1, 'PATHWAY': 1, 'EXECUTIVE': 1, 'COORDINATION': 1, 'SHAPE': 1, 'CRITICAL': 1, 'BEHAVIOR': 1, 'PHASE': 1, 'IMPORTANTLY': 1, 'FUNCTIONING': 1, 'BEHAVIORS': 2, 'PLAY': 1, 'NEURONAL': 2, 'DENSITY': 1, 'REDUCES': 1, 'HUMAN': 1, 'TIMES': 1, 'POTENTIAL': 1, 'NEURONS': 1, 'MODELING': 1, 'LOCAL': 3, 'RESPONSE': 1, 'EMOTION': 1, 'AMPLITUDE': 1, 'COMPUTATIONAL': 1, 'CONCURRENT': 1, 'PREDICT': 1, 'COMMUNICATION': 1, 'COMPONENT': 1, 'TEMPORALLY': 1, 'VIVO': 1, 'SHIFTS': 1, 'FREQUENCY': 2, 'ROLE': 1, 'INFLUENCES': 1, 'MEMORY': 1, 'PERCEPTUAL': 1, 'BIASES': 1}

(This is what that transformation did to the same poster as above)


Using this format let me sum all of the mentions of certain words together across all 902 posters, I found that these were the most commonly used words:



Again, I analyzed 902 abstracts. Memory is mentioned 1189 times. On average, that’s more than once per abstract!


To prove to myself that this wasn't a bogus finding caused by some outlier or problem with my cleaning process, I plotted how often each abstract used the word "memory".

It looks pretty reasonable. Some abstracts mention memory a whole lot and some don’t mention it at all. Also, the posters here are in plotted in order of poster session. And the poster sessions each have a topic. Its cool that you can easily guess which sessions were about memory and which weren't.


This abstract used the word “memory” 16 times!

TRANSCRANIAL  DIRECT  CURRENT  STIMULATION  IMPROVES
ASSOCIATIVE  MEMORY  IN  INDIVIDUALS  WITH  DEPRESSION.
Cheryl   Abellanoza1,  James   Schaeffer1,  Heekyeong   Park1;  1University  of  Texas
- Arlington — The  dorsolateral  prefrontal  cortex  (DLPFC)  is  important  for
both working memory and long-term memory, such that DLPFC activity
promotes  associative  memory  by  forming  relational  processing  between
items during on-line processing. fMRI studies have shown that increased
DLPFC activity during encoding relates to successful associative memory
in  normal  controls.  Neuropsychological  patients,  including  depression
patients,  show  disorders  in  DLPFC  activity,  along  with  impairments  in
associative  memory.  Transcranial  direct  current  stimulation  (tDCS)  is
a noninvasive, safe, and cost-effective form of brain stimulation that is a
useful tool for examining the causal relationship between brain areas and
cognitive functions. The present study investigated whether tDCS of the
left DLPFC would enhance associative memory in individuals with depressive 
symptoms. Subjects (depression, control) engaged in a double-blind,
two-session  (anodal,  sham)  study  where  they  studied  items  and  
completed  item  and  associative  memory  tests.  tDCS  was  administered  prior
to  memory  tasks.  For  item  memory  test,  subjects  studied  a  list  of  items
and made “old/new” recognition judgments with confi dence ratings. For
associative memory test, subjects studied word pairs and indicated if test
pairs were studied in the same pairing at study (“intact”), studied but with
different pairings (“rearranged”), or not studied (“new”). Results showed
that only the depression group showed enhanced associative memory after
anodal tDCS administration. However, such memory enhancement effects
due to tDCS were not found in item memory. Control subjects did not show
any difference due to tDCS. These fi ndings demonstrate the role of DLPFC
in associative memory and the nature of memory defi cits in depression.

Ugh. Ok. We get it. This is about memory.


Anyways, let's move on. Here's another interesting thing that I found:

I used a quick clustering algorithm to group abstracts by the words that were used. I found some categories that resemble the fields of neuroscience...kind of. The algorithm I used doesn't name the clusters so I just made those up myself.


Topic
Key terms:
Topic 1: Oscillation things
STIMULATION MOTOR ALPHA THETA EEG POWER PHASE ACTION OSCILLATIONS HZ
Topic 2: Language and Speaking
LANGUAGE SENTENCES ENGLISH SENTENCE COMPREHENSION READING BILINGUALS HEMISPHERE NATIVE SPEAKERS
Topic 3: ERPology
ERP ERPS EFFECT EFFECTS RESPONSE AMPLITUDE POTENTIALS CONDITION LARGER EARLY
Topic 4: Memory and Hippocampus
MEMORY RETRIEVAL ENCODING WORKING EPISODIC RECOGNITION HIPPOCAMPUS HIPPOCAMPAL INFORMATION WM
Topic 5: Actual Cognitive Neuroscience
COGNITIVE TASK ET AL USING TASKS NEUROSCIENCE PERFORMANCE RESEARCH INDIVIDUAL
Topic 6: Reward or something?
TASK INFORMATION REWARD CONTROL RESPONSE COGNITIVE CUES TRIALS MODEL PREDICTION
Topic 7: Learning (and sleep?)
LEARNING TRAINING SLEEP LEARNED GROUP NOVEL SESSION PERFORMANCE TEST NEW
Topic 8: Old People and Young People
ADULTS AGE OLDER CHILDREN PERFORMANCE YOUNG FUNCTION YEARS DEVELOPMENT ABILITY
Topic 9: Vision and Attention
VISUAL ATTENTION STIMULI LOCATION OBJECTS SPATIAL OBJECT PRESENTED TARGET REPRESENTATIONS
Topic 10: Emotional Emotions
EMOTIONAL EMOTION STRESS NEUTRAL NEGATIVE STIMULI ANXIETY AFFECTIVE COGNITIVE POSITIVE
Topic 11: Cortical Networks
NETWORK CONNECTIVITY FRONTAL NETWORKS GYRUS PREFRONTAL INFERIOR ANTERIOR TEMPORAL PARIETAL
Topic 12: Words
WORDS WORD SEMANTIC EFFECT EFFECTS PRESENTED INCONGRUENT RECOGNITION PRIMING LEXICAL
Topic 13: Faces
FACES SOCIAL FACE ASD PERCEPTION FACIAL INDIVIDUALS STIMULI PRESENTED AUTISM
Topic 14: Brain Structure
CONTROLS MATTER COGNITIVE VOLUME HEALTHY WHITE CORTICAL INDIVIDUALS DISEASE STRUCTURAL
Topic 15: Hearing
AUDITORY SPEECH FEEDBACK MUSIC PERCEPTION PITCH TEMPORAL SOUNDS PERCEPTUAL FREQUENCY

Torben. What the hell am I looking at?

Good question, dear reader. I used Latent Dirichlet Allocation (LDA) https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation to cluster all of the abstracts into categories based on their content. Basically, I told this algorithm to split the abstracts into 15 groups. Fifteen being a completely arbitrary number. The groups were decided based on how specific certain words are to certain categories. I displayed the 10 words that most characterized each group, the first word being the most characteristic.

It is important to point out that LDA is sensitive to the parameters that I used behind the scenes. For example, I ran the same analysis with different numbers of categories to see how that would affect the output. It shuffled and mixed these categories around a little bit but the gist was the same.


Using LDA, we can also plug an abstract back into the model and see which categories it fits. Here’s how mine did:


Looks like my paper fits into the cognitive neuroscience, hearing(?) and oscillation categories. I thought my paper would be more in the Oscillation category but I used a lot of 'Cognitive' words and words like 'perception' and 'auditory' from the Hearing category so it makes sense that my paper would fall into a couple categories.


What does all of this tell us?

Well first of all it is pretty cool that the categories that I found mostly reflect the submission categories for CNS. You can clearly see the canonical fields of study like memory and vision reflected by these clusters. But there's more here. I think it's a funny easter egg that ET and AL showed up in the Cognitive category. For those of you who don't know, you put "et al." in your byline if you have many authors. This suggests that much larger research groups are working on these kinds of problems. Hmm. Weird.

One idea from this analysis is that maybe conferences should use LDA from the previous year to decide what categories they should use for poster sessions. For example, there are people really interested in the structure of the brain but there wasn’t a specific session for only that. It would be cool if the structure of science could update itself in real time. 

But what is super, super interesting is that the people studying oscillations are the first group. This means that the vocabulary they use is the most specific. (The distribution of topics has a Dirichlet prior meaning it tends to have a log-normal-ly looking distribution). It’s literally like they are speaking in a different language than everybody else!  This means that very few people in other fields are talking about things like oscillations and the power spectrum, at least in their abstracts, with regard to their more specific field of study. My work was one of the few posters that partly fell into the oscillation category and in my anecdotal experience, I only connected with a few other people interested in these kinds of things at this conference.


The Upshot

The lesson here is that it’s often helpful to take a step back and look at the big picture. Science happens in a social and linguistic context and not enough scientists who participate in this system recognize this fact. Doing meta-analyses of our process can show us things that aren't evident at smaller scales. In this case, scientists seemed more interested in memory than cognition at this 'cognitive neuroscience' conference and the few people were using an 'oscillation' vocabulary in their work. It would behoove us all to be mindful of how we structure our science, communicate our ideas, and listen to other perspectives.

If you're interested, the cleaned abstract metadata and some the interesting parts of the code can be found here: https://github.com/torbenator/cns_analysis

No comments:

Post a Comment