Jesters, Koshare clowns and jokes generated with Machine Learning.

Yazmin T. Montana
Mar 27, 2023
7 min read

Archetype, derived from the ancient Greek language, refers to a universal symbol that serves as the original pattern for more specific symbols. The interpretation of archetypes varies depending on the field of study. For instance, in psychology, archetypes are considered to be models that exist within the human psyche. On the other hand, in philosophy, archetypes represent the ideal forms of specific objects. The philosopher Plato, from ancient Greece, introduced the concept of pure and perfect forms, also known as Forms or archetypes, which are shared by many objects in the real world.

The study of archetypes in psychology was really set in motion by Carl Gustav Jung.This Swiss psychiatrist became well-known for his ideas on extroversion and introversion, as well as his fascination with religion, myth, mysticism, and alchemy. Jung believed that archetypes have their roots in the collective unconscious.

Jung's Aion book describes several main archetypes, including the Self. For Jung, the Self represents the unification of an individual's conscious and unconscious life, achieved through individuation. The mandala, a circle-shaped symbol significant in Hindu and Buddhist practices, best represents the Self. Psychologist David Fontana notes in his book that meditating with mandalas can help an individual access their unconscious and facilitate the process of individuation.

ree — Tibetan Buddhist monks drawing a Kalachakra mandala.

Tricksters are archetypal because they appear in different cultures, yet they are expressed with the inflection of a particular culture. For example, in the Native American tradition, the coyote features as a trickster figure in that culture’s myths, whereas the trickster Loki – from Norse mythology – is a shape-shifting god and lover of mischief. In English folklore, Puck, who plays a pivotal role in Shakespeare’s A Midsummer Night’s Dream, is a trickster that takes the form of a fairy. And then we have the jester, which is commonly equated with a trickster. Nevertheless, they are quite different in nature.

By standard definitions, a jester is someone who dresses in flamboyant clothing and entertains a medieval court with jests, mockery, and jokes. On the other hand, the trickster is a mythical character known for their mischievous and playful nature. They often employ tricks to impart lessons to others through cunning means.

Jung believed that the trickster is an ancient and fundamental archetype. In his work, "On the Psychology of the Trickster-Figure," he analyzes this archetype and observes that it appears in myths from around the world. While the characteristics and representation of the trickster vary greatly, Jung notes that the rabbit embodies this archetype in African and African American storytelling, while the fox represents it in the folklore of Dogon, Scotland, Bulgaria, Russia, France, and Finland.

Many native traditions held clowns and tricksters as essential to any contact with the sacred. People could not pray until they had laughed, because laughter opens and frees from rigid preconception. Humans had to have tricksters within the most sacred ceremonies lest they forget the sacred comes through upset, reversal, surprise. The trickster in most native traditions is essential to creation, to birth. -Byrd Gibbens – Professor of English at the University of Arkansas at Little Rock

The New Mexico Pueblo Indians have sacred clowns, also called Pueblo clowns, who serve as their tricksters. These clowns, known as Koshare, have the primary responsibility of spreading joy and laughter, which are believed to have healing powers. They use their wit to highlight the absurdity of various situations, but also have a darker side, as ridicule can sometimes make people feel uncomfortable. The Koshare wear striped body paint and pointed hats, similar to medieval court jesters, during festive occasions.

The role of jesters, clowns, and buffoons in traditional tales and fables extends far beyond mere entertainment, yet modern society seems to have lost sight of this fact. These figures have been revered as spiritual mentors and guides for the youth, offering consolation and comfort to those in grief and distress. Their words of wisdom carry great weight and are held in high regard. Interestingly, despite their perceived power and influence, these figures have historically been relegated to a lower status than even the lowest servant, creating a curious paradox.

Is humor intrinsically linked to our ability to identify imperfections and tragedy?

According to analytical psychologists, there are 4 theories on humor and the trickster archetype:

Incongruity theory: Can be traced back to historical figures such as Kant, Kierkegaard, and even comments made by Aristotle in Rhetoric.
Superiority theory: According to Thomas Hobbes, humour arises from a “sudden glory” felt when we recognize our supremacy over others.
Relief theory: Sigmund Freud and Herbert Spencer saw humour as fundamentally a way to release or save energy generated by repression.
Play theory: Attempts to classify humour as a type of play.

The irrationality, the unconscious, is the worst enemy for the cognitive science in the attempts of making artificial intelligence. The mechanism “if…then” does not make sense. In “humour” we may find the entire palette of emotions: sexual, aggressive, sarcastic, hatred…all these could be humour. -Arvo Krikmann, 2009.

The Jester dataset:

Link here .

The jester database hosted by the University of California, Berkeley, contains 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003 with the following format:

3 Data files contain anonymous ratings data from 73,421 users.
Data files are in .zip format, when unzipped, they are in Excel (.xls) format
Ratings are real values ranging from -10.00 to +10.00 (the value "99" corresponds to "null" = "not rated").
One row per user
The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 - 100.
The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of "universal queries" in the above paper).

A joke recommender (in general terms):

You can start working with this dataset after installing the Surprise package on python and libraries for NLP such as defaultdic, GridSearchCV, and the ever useful numpy and pandas.

For data preprocessing, we can begin by generating a user ID column and removing the column that indicates the number of jokes rated by each user:

From github:

User ID 
1  2  3 4  ...     96     97     98     99    100\n",
            "0            1  -7.82   8.79  -9.66  -8.16  ...  99.00  -5.63  99.00  99.00  99.00\n",
            "1            2   4.08  -0.29   6.36   4.37  ...  -2.14   3.06   0.34  -4.32   1.07\n",
            "2            3  99.00  99.00  99.00  99.00  ...  99.00  99.00  99.00  99.00  99.00\n",
            "3            4  99.00   8.35  99.00  99.00  ...  99.00  99.00  99.00  99.00  99.00\n",
            "4            5   8.50   4.61  -4.17  -5.39  ...   1.55   3.11   6.55   1.80   1.60\n",
            "...        ...    ...    ...    ...    ...  ...    ...    ...    ...    ...    ...\n",
            "24978    24979   0.44   7.43   9.08   2.33  ...   9.03   6.55   8.69   8.79   7.43\n",
            "24979    24980   9.13  -8.16   8.59   9.08  ...  -8.20  -7.23  -8.59   9.13   8.45\n",
            "24980    24981  99.00  99.00  99.00  99.00  ...  99.00  99.00  99.00  99.00  99.00\n",
            "24981    24982  99.00  99.00  99.00  99.00  ...  99.00  99.00  99.00  99.00  99.00\n",
            "24982    24983   2.43   2.67  -3.98   4.27  ...  99.00  99.00  99.00  99.00  99.00\n",

The data frame will be reformatted by creating columns for user ID, joke ID, and ratings, instead of having a column for each joke:

            "   User ID Joke ID  Rating\n",
            "0        1       1   -7.82\n",
            "1        2       1    4.08\n",
            "2        3       1   99.00\n",
            "3        4       1   99.00\n",
            "4        5       1    8.50\n"

Lastly, the data frame is passed into a surprise dataset, which is the required data type for building a recommendation system using the Surprise library.

"source": [
        "reader = Reader(rating_scale=(-10, 10))\n",
        "\n",
        "# columns must be passed into this method in the order: user (raw) ids, item (raw) ids, ratings\n",
        "data = Dataset.load_from_df(df[['User ID', 'Joke ID', 'Rating']], reader)"
      ],
      "execution_count": null,
      "outputs": []

The chosen algorithm for this task is the k-nearest neighbors with means. This collaborative filtering approach leverages the mean ratings of each user to find the k most similar users and recommends items based on their preferences. Collaborative filtering aims to identify users with similar tastes and preferences, and suggest items that were positively rated by those similar users.

The boolean variable user-based has been set to false in order to compute similarities between items rather than users. Two similarity measure options have been implemented: mean squared difference and cosine, which compute the mean squared difference and cosine similarity, respectively, between all pairs of items. Minimum support has been provided as an option for ensuring a minimum number of common users for the similarity not to be zero, with 3, 4, and 5 being the specified values. To evaluate performance, root mean squared error and mean absolute error are being utilized as measures.

# determining the optimal algorithm parameters with GridSearchCV\n",
        "sim_options = {\n",
        "    \"name\": [\"msd\", \"cosine\"],\n",
        "    \"min_support\": [3, 4, 5],\n",
        "    \"user_based\": [False],\n",
        "}\n",
        "\n",
        "param_grid = {\"sim_options\": sim_options}\n",
        "\n",
        "gs = GridSearchCV(KNNWithMeans, param_grid, measures=[\"rmse\", \"mae\"], cv=5)\n",
        "gs.fit(data)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ux2mN2smU4I3",
        "outputId": "3123069f-57de-4365-828e-b21e6f1e4059"
      },
      "source": [
        "print(gs.best_score[\"rmse\"])\n",
        "print(gs.best_params[\"rmse\"])"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "4.1838836032075175\n",
            "{'sim_options': {'name': 'cosine', 'min_support': 3, 'user_based': False}}\n"
          ],
          "name": "stdout"

According to the results obtained, the best set of parameters for the recommender is to use cosine similarity as a similarity measure with a minimum support of 3 based on root mean square error. The recommender is then trained using these parameters.


 uid = 1  # raw user id (as in the ratings file)  iid = 1  # raw item id (as in the ratings file)# get a prediction for specific users and items pred = algo.predict(uid, iid, r_ui=-7.82, verbose=True)      


user: 1          item: 1          r_ui = -7.82   est = -3.43   {'actual_k': 40, 'was_impossible': False} 

In [ ]:
 uid = 24983  # raw user id (as in the ratings file)  iid = 87     # raw item id (as in the ratings file)# get a prediction for specific users and items pred = algo.predict(uid, iid, r_ui=7.23, verbose=True)      


user: 24983      item: 87         r_ui = 7.23   est = 4.93   {'actual_k': 40, 'was_impossible': False}

The trained algorithm predicts the ratings for user 1 and joke 1 as -3.43, while the actual rating is -7.82. Similarly, for user 24983 and joke 87, the predicted rating is 4.93, whereas the actual rating given is 7.23.

Based on the two examples provided, the algorithm's predictions appear to be in line with the actual ratings, but the numerical values are not entirely accurate. To improve the recommendation results, the jokes can be ranked by error to identify which jokes have the highest prediction errors.

Upon analyzing the algorithm's predictions, it is evident that the recommender-generated ratings closely match the actual ratings for the best predictions, with a negligible error of 0.000008. However, the error reaches a maximum of 18.83 in the worst predictions. Using this algorithm, recommendations can be generated for each user.

Can Machine Learning go beyond recommendation algorithms?

Despite the advancements in artificial intelligence and machine learning, many experts believe that generating original jokes is still beyond the capability of machines. According to analytical psychologists, humor sense is a uniquely human trait that is developed through cultural and societal experiences. While machines can learn to recognize patterns and generate responses based on data, they lack the creativity and emotional understanding necessary to create truly funny jokes. While recommendation algorithms can help machines generate jokes based on existing patterns and preferences, they cannot fully replicate the nuances of humor that make it so uniquely human. As such, it is likely that humans will continue to be the primary source of comedy for the foreseeable future.

ree — Patrick Nagatani, Artist. Koshare/Tewa Ritual Clowns, Missile Park, White Sands Missile Range, New Mexico , 1991

A monk once asked the Master: “Has a dog a Buddha nature too?” Whereupon the Master replied, “Woof!” - Carl Jung (1939)