tldr: We propose InterpreXis, a novel approach to finding human-interpretable concepts inside contextual word embeddings. InterpreXis involves training linear classifiers to identify interpretable axis groups, which can be used for
downstream tasks such as text classification and visualization.
- duplicate
secrets_example.json
+ renamesecrets.json
- copy and paste OpenAI key in
secrets.json
- open
final_pipeline.ipynb
- run the first few cells until you see this:
change the classification to your desired category (animals, art, cities, clinical).
- do not run the "create dataset" or "token and create DistilBERT embeddings" sections. scroll until you see this:
- run the rest of the notebook and everything should proceed smoothly!
- the final dataset is located in
data/final_data.csv
(if you download the whole repo, it should be automatically detected when running final_pipeline.ipynb
)
the main files you will need to run our final code are described above. a brief summary of the repo structure is included below:
data/
: this folder contains the various files we used to construct our final dataset, as well as other datasets we experimented with while developing our methodologyimg/
: this folder contains figures generated by our code to show the results of different experimentsoutputs/
: this folder contains files with the textual output from running our code (e.g., llm outputs, statistics, etc.)new-method-exp/
: this folder contains files with earlier experiments/iterations of our final methodologyold-method-exp/
: this folder contains files with the experiments/code needed to run our intial methodology (see Sec. 3 of our paper)