San Diego Law’s Call for Participants

February 19, 2016
By
- Monica Bay

On June 17, the University of San Diego Law School will present the “Workshop on Legal Text, Document and Corpus Analytics,” aka LTDCA-2016.” The school has announced its “First Call for Participation;” the event will be held on its campus, sponsored by USD’s Center for Computation, Mathematics and Law.

Why should you be interested? It’s my pleasure to turn the mic over to Jana Sukkarieh, of Noha Z, to address that question Sukkarieh focuses on the use of “machine and human intelligence for innovation, problem solving and product development.” She is based in New York, and previously worked as Chief Scientist at Counselytics and was a Research Scientist at Educational Testing Service. Among her other credentials, she’s an investor and consultant; was Senior Analyst and Investigator at Secerno, which was acquired by Oracle.

The mic is yours, Jana:

Earlier this month, I attended ALM’s Legaltech New York 2016. The topics mostly clustered around management, e-discovery, cybersecurity and privacy including redaction systems.

I found most interesting the technologies that are working on analyzing social media data; speech-to-text legal applications; and finding relevant documents by elimination (e-discovery by irrelevancy rather than relevancy if you wish).

Others are working on knowledge representation and inferencing; documents written in languages other than English; efficient topic clustering; and questions about whether precision and recall are indeed the right measures. (Thank you, Bill Dimm, CEO of Hot Neuron, for spending time with me to converse about evaluation and metrics).

Last October, I researched and created a list of some of the existing legal technology, for Counselytics, where we used artificial intelligence for contractual “legalese.” According to the Legal Trek forum in London last year, there are around 500 legal technology startups worldwide. It might be an intimidating number for newcomers, but as Monica Bay has preached, “You do not have to be the first but aim to be the best.”

Out of the 60+ startups I researched, I found that legal technologies offer many types of specializations and goals. For example, some focus on patents, others on contracts; some address pricing, others transparencies. Platforms vary from mobile-enabled or not; mode of functioning (collaborative, networked, interactive or not).

One of the challenges everyone could face, aside from security and privacy, is that documents might be multi-modal—consisting of a combination of text, images, and hyperlinks—or might not always be machine readable.

What I missed from the research and conversations, especially in a regulated domain like law (and because it is important to be “the best”) is a way to compare technologies to see which ones are the best fit for the need.

This is especially obvious in the wildly competitive and over-populated arena of e-discovery technologies. Decision-makers (be they CIOs, litigation leaders, and/or lawyers) should be able to compare available similar technologies, and decide which technology is best for their organization’s purposes.

As a potential purchaser, I have not yet found any satisfying answesr from any e-discovery vendor on “what makes you better than the rest?” But of course, this issue does not just apply to e-discovery.

What I do not see being discussed, at least in the startup world, is that the legal technology discipline could have very high-stakes—more than many traditional artificial intelligence or text mining applications.

In sales or search engine results, the repercussions of a wrong result might be trivial. But in legal, the stakes are high and bad results can be fatal to a case (or firm). All the more reason for conﬁdence measures and transparent justiﬁcations.

Another challenging issue is that sometimes—particularly in legal, ofﬁcial or governmental data—what is not written or available in the text is oftern more important than what is. How can AI and text mining technologies reach a stage of reading between the lines?

Maybe one way forward—taking into consideration automatic text mining and understanding limitations—is to analyze other documents coupled with the original document; or other available legalese-speciﬁc knowledge.

One thing for certain is that it is a very exciting area for both industry and academia. There is much to be explored, discussed and achieved.

That’s one reason to like to contribute and/or participate in the USD workshop organized by Karl Branting (Principal AI Engineer at MITRE) and Ted Sichelman (Professor of Law at USD). The deadline for paper and demo abstract submissions is April 29th.

Please refer to the call for papers for additional details, and see below for conference info.

The Workshop on Legal Text, Document and Corpus Analytics

“Recent improvements both in Human Language Technology (HTL) and in techniques for storage and rapid analysis of large data collections have created new opportunities for automated interpretation of legal text, improved access to statutory and regulatory rules, and greater insights into the structure and evolution of legal systems,” the school said.

“These techniques hold promise for the courts, legal practitioners, scholars, and citizens alike. These advances have coincided with a rapid expansion of interest in automated processing and understanding of legal texts on the part of industry, government agencies, court personnel, and the public.”

The workshop will address “research ideas and practical developments that involve interpretation of legal text, analysis of structured legal documents, improved publication and access to document collections, predictive analysis based on legal text mining, and visualization of legal corpora,” the school explains.

Among the topics
• Big Data techniques (legal and financial—data mining, machine learning.
• Network models of statutory and case law, including visualization techniques specialized for legal.
• Global, emergent and dynamic properties of legal text collections (e.g.,modularity, language models, complexity.
• Improving public access (e.g., statutory and regulatory rule sets).
• Legal question-answering systems.
• Legal document analysis, including semantic analysis, information extraction, abstraction, summarization, topic modeling, coreference resolution, and document-structure analysis.
• Legal predictive and descriptive models (e.g., probability of success of a motion/claim; expected case duration, settlement value; expected consequences of alternative litigation decisions.

Target Audience
Researchers and practitioners from industry, academia and government working at the intersection of HLT, artificial intelligence, social science, data and network science and law.

Monica Bay is a Fellow at CodeX: The Stanford Center for Legal Informatics